Saturday, April 11, 2020

(in)Articulate Automation


My lips are moving and the sound's coming out

The words are audible but I have my doubts


-Missing Persons, ‘Words’





Uncanny Valley

Voice activation has become nearly ubiquitous in recent years, with a growing number of households including devices focused on this interaction. It is a remarkable market growth for units that do not work as well as intended.


Not so long ago, there was once a time when talking to a machine was considered an oddity – usually interacted with by folks with a limited grasp of reality.


The leap from rarity to ubiquity came on suddenly, stunning even the most evangelical advocates. Today, the proliferation of 'smart speakers' (a double meaning if one was ever to be had) brought even six-pack Joe into the heady world of voice control.


Just how did we get here? Voice interaction with our computers has been a dream for the wunderkind of MIT and a core character in much of technical science fiction. The computers of the then futurists had personalities built to fit the 'human' interaction - from HAL, 2001's semi-psychotic ulterior motive personality disorder to the overtly chirpy interface central to the Infinite Improbability-driven Heart of Gold.


The current generation of voice-driven smart devices is a leap from the first real-world voice-capable computers. Audrey (Bell Labs, 1952), developed to minimize voice bandwidth before automated switching, could understand spoken numbers only for specific operator voices. Shoebox (IBM, 1962) could understand the numbers 0 through 9 and up to 16 spoken English words. Unlike Audrey, the system did not rely on specific voices; it identified three parts of the word via an analog filter circuitry.


Canned Reality


The reality of voice interaction is far stranger than our expectations. True, there is still the over-reaching effort to have these machines respond to us in a way that approaches the uncanny valley. We want our machines to be 'part of the family,' a natural call and response between intimates. This feeling of closeness belies a subtle manipulation of our day-to-day interactions by the process itself.


Anyone who has interacted with Alexa, Google Home, Homepod, or the mobile phone speech-to-text tool knows how little these devices understand natural cadence speech. For the most part, a person cannot simply utter a complex command/request without at least a few rounds of hearing 'sorry, I don't understand' or having the device play Darling Nikki when asked to 'set to do not disturb.'


As a native New Yorker, having my Alexa units' inability to keep up with my fast-paced speech is damn frustrating, resulting in me repeating a request repeatedly. The same is true for non-native speakers of English. Ask the numerous European friends who have had extended stays in my home. These folks often speak English better than many of my native-born associates. Still, the frustrations they encountered in just attempting to have Alexa set a timer while cooking were off-putting. Ultimately, we could reliably interact with the devices; it only took a complete change in how we spoke.


Me Talk Pretty?


The technology may affect our speech patterns, constructing a more banal and common form of pronunciation. Until the technology can catch up, we are forced to perform a bit of code-switching, speaking in our regular cadence and pronunciation to each other while addressing the technology with something else. Commands to these devices require a slower, more sharply articulated speech- demanding accentuated Ps, Ds, and Bs. The process can feel like being forced to speak a staccato version of 'The Queen's English' (or the now-defunct Mid-Atlantic speech).


This is not the first time technology has influenced the way humans talk. Each new leap in voice communication has forced an alternate voice from its users to ensure efficient intelligibility.


Modern music has a very intimate characteristic that did not and could not exist before the first decade of the 1900s. Singers and performers of the day needed to 'reach the back of the room' by sheer skill. They also needed to have the vocals cut through the instruments and often the sound of dancing feet. Opera singers could do this with sheer power, albeit with more quietly considerate audiences.


Pre-War Dance Hall singers (not to be mistaken for the Jamaican blending of reggae, hip-hop, and R&B) needed a specific range and technique not to be washed out and heard clearly across the room. The falsetto (or, more rightly, a Countertenor ) voice and a passive megaphone provided just the right sound to make the vocals an explicit part of the song. You can hear a bit of this style in early World War I and II movies showing soldiers dancing while on R&R.


That Voice 

Some remnants of the style can also be heard as Swing Big Bands added early vocalists- soon replaced by mellow, more intimate whiskey voices of Bing Crosby, Frank Sinatra, Helen Forrest, and Billie Holiday. This new sultry style was made possible only by the addition of microphone amplification. All of these singers were capable of belting it to the back room, but the expressiveness and seduction required a more subtle delivery while still being front and center of the composition. The cultural switch did not go easily for some, as these crooners were seen as too mushy and antithetical to the music by many. Of course, the young kids loved how it gave them a new sensuality and how it had them dancing close - it's no wonder why it was the sound of a generation at war.


Newscasters / Newsreaders are still influenced by the significantly affected vocal delivery of early presenters on the radio. If you have ever listened to early newsreels (like they played in movie theaters after the talkies took over) or recordings of presenters like Walter Winchell, you hear that voice. It is a voice that relies on a sharp but deliberate delivery, a higher, almost nasal register, and a pronunciation of words that sound like a mix of public school British and proper Boston. This is partly due to the era's social ideas on what an authoritative voice should sound like and the limited capability of the early condenser microphones used for radio broadcasting.


The 'voice' itself carried on far past its technical reasons - so an announcer could be understood through the noise of typical radio transmission, especially at the receiver end. The sound became a hallmark of a radio/TV news person, with many taking the style on to show that they were professional broadcasters. You can hear it in how Edward R. Murrow or Walter Cronkite spoke and delivered the news; the affectation is smoother, but the deliberate punchiness remains. The infamous Roger Grimsby of NYC's WABC in the '70s and '80s is a direct descendant and one of the last I can recall that overtly presented in style. It is worth noting that many of the top-flight national news hosts also employ a modern take on the style, but in keeping with the contemporary casual feel, it is an understated method.


Subtle Singularity


Is the technology we have become so enamored with changing how we speak? Several evolutionary biologists have shown evidence that our dependence on digital communication is changing how we think, store, and retain memories. Some discussion and newer studies are looking at whether young users of voice-controlled smart devices are doing a version of code-switching or defaulting to the more pronounced pronunciation used to tell Alexa what they want. The research is looking, in particular, at how the kids talk to each other in the noisy, messy playtime or when frustrated in getting the point across.



Is this the step that brings the devotees of John von Neumann's Singularity to mass acceptance?

Tuesday, April 7, 2020

Who's Zoomin Who?

Over the last few days, there has been a great deal of noise about the video conferencing tool Zoom and some serious concerns. While an equal measure of blame is to be had, it is particularly irksome that many ‘end-users’ declare willful ignorance.

Well, that escalated quickly

The platform experienced a dramatic increase in usage when it offered free unlimited usage. COVID-19’s spread prompted stay-at-home orders to sweep across the country, closing non-essential businesses and schools.

Education, in particular, jumped onto the Zoom wave as the platform is easy to obtain and runs on multiple Operating Systems, including apps for smart devices on iOS and Android. Most importantly, an individual can have a session up and running in mere moments.

The ease of use is where a good number of concerns start. The news has been aflutter with breathless reports of individuals ‘zoomjacking’ the meetings to insert inappropriate and sometimes downright dangerous content. There is especially high anxiety about the potential for this to happen with school-age children getting live instructions.

Paging Captain Obvious

These incidences are disturbing, and there is no amount of spin that could justify these noxious interruptions. Humans are an especially rancid species where each member must often operate under the assumption that another is out to do them harm -(just ask any woman what they fear about just walking down the street, day or night). It is a curiosity to me that many ignore this fundamental concept when using online social spaces, communication apps, or conferencing tools.

There has been a long-standing golden rule of the internet (no, not rule 34), which, when ignored, is the root of horror stories. The rule: Presume that any information you put online anywhere will become public is one that every person above the age of eight should have drilled into them as the ultimate commandment. Or, to be more pop culture relevant, if you upload it - ‘they’ will attempt to find it.

This is on you, mostly

The recent news of the FBI suggesting that educational institutions, businesses, and local governments avoid Zoom for security concerns is one that is spot on and misses the point altogether. Many officials have rushed out hyperventilating statements declaring a ban on the use of Zoom. Regrettably, they fail to declare that most of the problems are their own fault.

What’s Wrong with the Panic:

A good number of pundits have been quick to lather up some good hysterics insinuating that the platform pretty much invites interlopers to wreak havoc on unsuspecting users.

Zoom does or rather did, download, with most security modes defaulted to an ‘off’ position. This made the process of getting up and running a bit simpler and provided a first-time user with a successful feedback loop. If one can get onboarded and running once, they are more likely to do it again and explore more features.

The platform,(as do most app-based conferencing tools), has a host of security-based features that can help mitigate easy access to malcontents looking to be jerks. In particular, there are tools that every single user should be looking for before setting up a real meeting.

To not make these options your first priority as a facilitator is a grievous mistake that cannot be defended with a plea of ignorance (‘I am not technical’). Frankly, if you are on the Internet (willingly or as required by work), knowing the fundamentals of protecting your information is not optional. In general, you should be looking to set the following:

Set Sharing to Only Host - This may not work for all meetings, but as an initial setting, it can help prevent malicious material from being shown. You can change this once all the participants are confirmed -(for example, a school class with presentations).

Make the Meeting Private - Do NOT publish the link on open social channels. Send links only to specific individuals. If you must use social - send via Direct Messages methods.

Require a Password to enter- This can be mildly annoying to participants, but the added layer prevents undue access.

Enable a ‘Waiting Room’ or ‘Lobby’- Participants must wait for approval before entering the meeting. It is easier to remove, refuse, and block someone before they get to an audience.

Mute Participants on Entry- Setting the first entry to a session as muted can help prevent the quick outbursts. This also gives you an additional layer to identify an interloper.

Lock Meetings - This is a heavy-handed option. The feature allows you to prevent any new participants, effectively making the meeting a silo.

Do not allow participants to join prior to host- Using the schedule feature allows you to set when folks can join the meeting. When it opens only after the host arrives, the management of participants is more effective.

What’s Right about the Panic:

There are a number of issues inherent to Zoom (and other conferencing tools) that should be considered.
Zoom Link structure - The links include six to eight digits meeting numbers. The structure is small and consistent enough that a simple brute-force attempt at guessing them has happened. Again, do not rely on just the link - secure it with the tools mentioned above.

Encryption claims - Zoom's technical details often stated that the meetings were encrypted ‘end-to-end’. The truth is that the content is only encrypted ‘in transport,’ which means that it is unencrypted at a mid-point and then encrypted again to the recipient.

That mid-point is where a third, unintended party could gain access to the information. This does not mean someone can enter the meeting from this point, but they could see what you are presenting. Expect an end-to-end standard very soon for Zoom.

True security- Most State and Federal level government agencies and the military forbid the use of Soft Codec applications such as Zoom, Skype, Slack, etc, because of security concerns. If the information is of a truly sensitive nature, then the only real solution is software with a dedicated independent hardware package. These have long-established track records and a hefty price tag.

Privacy: Yes, the platform has been known to send data to Facebook (regardless of whether or not you have an account). This is troubling, but the product is not an outlier - find me a modern app that does not do this. Data manipulation is rampant with online applications; this is an industry and global issue.

Hacks: Zoom has made standard a number of commonly used workarounds, i.e. hacks, to make the implementation smoother. There is a chance this could be used as an exploit for unintended parties to gain access to the device's laptop camera/mic. Again, good management of your devices and general safety practices is in order.

Man in the Mirror

Zoom and the other platform-based conference systems are not perfect. Honestly, as a person who lives in both the Audio Visual installation and Information Technology world, they all should have done better.

Yet, why do we act shocked when someone gains access to our private messages, interrupts a meeting they are not part of, or hacks a social media account. We know these things happen, and most often, the blame rests on our own refusal to take precautions. If we clicked on the link promising illicit pleasure and monetary reward, whose fault is it that a hacker gained access?

If you do not secure your Wi-Fi router, whose fault is it when a neighbor steals the bandwidth?

The simple fact is that too many want to compromise your private spaces, some for fun, others for profit. This is not a ‘Boys will be Boys’ apologist statement, rather it is a recognition that we must own the responsibility of protecting our sessions. There is no room for the willfully ignorant if you want to enjoy the benefits of modern technology - you must be proactive in making it secure.

This Page Left Intentionally Blank