Talk to Me – Voice Computing

Technologists predict that one of the most consequential changes in our daily lives will soon come from being able to converse with computers. We are starting to see the early stages of this today as many of us now have personal assistants in our homes such as Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana or Google’s Personal Assistant. In the foreseeable future, we’ll be able to talk to computers in the same way we talk to each other, and that will usher in perhaps the most important change ever of the way that humans interact with technology.

In the book Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think the author James Vlahos looks at the history of voice computing and also predicts how voice computing will change our lives in the future. This is a well-written book that explains the underlying technologies in an understandable way. I found this to be a great introduction to the technology behind computer speech, an area I knew little about.

One of the first things made clear in the book is the difficulty of the technical challenge of conversing with computers. There are four distinct technologies involved in conversing with a computer. First is Automatic Speech Recognition (ASR) where human speech is converted into digitized ‘words’. Natural Language Understanding (NLU) is the process used by a computer to interpret the meaning of the digitized words. Natural Language Generation (NGR) is how the computer formulates the way it will respond to a human request. Finally, Speech Synthesis is how the computer converts its answer into audible words.

There is much progress being made with each of these areas. For example, the ASR developers are training computers on how humans talk using machine learning and huge libraries of actual human speech and human interactions from social media sites. They are seeing progress as computers learn the many nuances of the ways that humans communicate. In our science fiction we’ve usually portrayed future computers that talk woodenly like Hal from 2001: A Space Odyssey. It looks like our future instead will be personal assistants that speak to each of us using our own slang, idioms, and speaking style, and in realistic sounding voices of our choosing. The goal for the industry is to make computer speech indistinguishable from human speech.

The book also includes some interesting history of the various voice assistants. One of the most interesting anecdotes is about how Apple blew its early lead in computer speech. Steve Jobs was deeply interested in the development of Siri and told the development team that Apple was going to give the product a high priority. However, Jobs died on the day that Siri was announced to the public and Apple management put the product on the back burner for a long time.

The book dives into some technologies related to computer speech and does so in an understandable way. For instance, the book looks at the current status of Artificial Intelligence and at how computers ‘learn’ and how that research might lead to better voice recognition and synthesis. The book looks at the fascinating attempts to create computer neural networks that mimic the functioning of the human brain.

Probably the most interesting part of the book is the last few chapters that talk about the likely impact of computer speech. When we can converse with computers as if they are people, we’ll no longer need a keyboard or mouse to interface with a computer. At that point, the computer is likely to disappear from our lives and computing will be everywhere in the background. The computing power needed to enable computer speech is going to have to be in the cloud, meaning that we just speak when we want to interface with the cloud.

Changing to voice interface with the cloud also drastically changes our interface with the web. Today most of us use Google or some other search engine when searching for information. While most of us select one of the choices offered on the first or second page of the search results, in the future the company that is providing our voice interface will be making that choice for us. That puts a huge amount of power into the hands of the company providing the voice interface – they essentially get to choreograph our entire web experience. Today the leading candidates to be that voice interface are Google and Amazon, but somebody else might grab the lead. There are ethical issues associated with a choreographed web – the company doing our voice searches is deciding the ‘right’ answer to questions we ask. It will be incredibly challenging for any company to do this without bias, and more likely they will do it to drive profits. Picture Amazon driving all buying decisions to its platform.

The transition to voice computing also drastically changes the business plans of a lot of existing technology companies. Makers of PCs and laptops are likely to lose most of their market. Search engines become obsolete. Social media will change drastically. Web advertising will take a huge hit when we don’t see ads – it’s hard to think users will tolerate listening to many ads as part of the web interface experience.

The book makes it clear that this is not science fiction but is a technology that will be maturing during the next decade. I recently saw a video of teens trying to figure out how to use a rotary dial phone, but it might not be long before kids will grow up without ever having seen a mouse or a QWERTY keyboard. I will admit that a transition to voice is intimidating, and they might have to pry my keyboard from my cold, dead hands.

Death of the Smartphone?

The smartphone has possibly been the most transformative technology of the past hundred years. It’s unleashed the power of the computer in a portable always-with-us way that has changed the way that most of us interface with the world. But as unlikely as it might seem, it also might be one of the shortest-lived major technologies in history.

When looking forward it seems inevitable that smartphones will largely be replaced by voicebot technology. Voicebots are already intertwining into our lives in major ways. Apple’s Siri, Amazon’s Echo and Google Assistant are already replacing a lot of other technologies.

Voicebots have already entered my life in several key ways. As a music lover I’ve gone through every technology upgrade since vinyl. I had a huge CD collection and burned tons of custom CDs of my favorite songs. I used an iPod heavily for a few years. I downloaded music and built custom playlists of my music. And I used streaming radio services. But this has all now been replaced by my Amazon Echo. It’s integrated into Amazon music, Sirius XM Radio, and Pandora, and I can just ask aloud to hear the music I want.

I also now use voicebots for simple web searches and I no longer have to use my phone or PC to find out when a local store or restaurant is open. I use my Echo to take notes to remember later, something that is important to me since I wake with ideas at 2:00 in the morning!  In the past I would scramble for something to write on, which inevitably woke me up – but no longer.

Voicebots are also supplanting a lot of apps I used to use. It’s a lot easier to just ask about the weather rather than look it up. I can ask for sports scores before my feet hit the floor out of bed. Voicebots are starting to displace other smartphone functions. I can now make and receive texts by voice – this isn’t quite fully integrated into Echo, but I expect it soon will be. Voicebots integrated into the car give us driving directions and can lead us to the nearest gas station, all directed by voice.

Voicebots are growing steadily better at voice recognition. I’ve had the Amazon Echo for about 18 months and it gets a little better month by month. Voicebots are also getting better at responding to requests. All of the major voicebots are using primitive artificial intelligence to learn from their mistakes and to get better at responding to user requests. Questions that puzzled my Echo months ago are now sailing through.

Some voicebot functions are still nearly unusable. I have Microsoft’s Cortana on my PC and it’s not really helpful in the way I would like to use it. Ideally it could replace most of my keyboard functions. But it’s not hard to forecast that within a few years that voice commands will finally make it easier to use a PC.

If voicebots are going to grow to the next level it’s going to take improvements in AI. But everything is pointing in that direction. Just a few weeks ago a new AI from Google learned the game of Go from scratch in just three days with nothing more than being given the rules of the game. The new AI won 100 games straight against the older Google AI that beat the best human player earlier this year.

As AI gets better the voicebots are going to get better. There will come a time soon where it’s easier to use a voicebot for most of the apps on a smartphone, and that’s when voicebots will start to eat away at smartphone penetration rates.

I for one would love to ditch my smartphone. Even after all of these years I’ve never been comfortable having to remember to carry it and I walk away and leave it all of the time. And somehow we’ve gotten roped into spending $600 or more every two years for a new device. I would be much happier wearing tiny earbuds that let me talk to a voicebot that has been able to learn my habits.

Most of the developers in the AI world think that voicebots will enable real digital assistants that step in many times a day to make our lives easier. This trend seems inevitable and one has to wonder how the emergence of voicebots will affect the huge push for 5G wireless? Most of the things that voicebots do for us are low bandwidth and could easily be done using a fully implemented LTE network. It’s hard to picture where this all might lead, but one thing seems certain – the death of the smartphone will probably be just as disruptive as its birth.

Is Santa Listening?

santa-watchingThis Christmas season brings not only the usual joy and cheer, but also new challenges and privacy threats, which seem to be the nature of technology these days. It seems even Santa isn’t immune to gifting technology which invades our homes with toys that gather secret information about us.

It turns out that the My Friend Cayla doll and the i-Que Intelligent Robot have the ability to spy on everything that kids (or anybody else) says within listening range of the toy. There have been a few other toys in the past that were capable of conversing with kids. Last year’s Hello Barbie chatbox also had this capability. But the big difference is that the Hello Barbie only recorded speech when a button was pressed while these new toys are always listening.

This phenomenon is not limited to toys and there are other devices today that listen to us all of the time such as Siri-enabled iOS devices, OK Google-enabled phones or the Amazon Echo with Alexa. It seems like 2016 was the year when technology began to actively listen to us, even though the concept has been around a bit longer. In 2015 there was a furor when it was revealed that Samsung TVs could both watch and listen to whatever was happening in the room with them. But now the market is seeing a lot of devices with this capability and one can imagine this is going to soon be included in a lot of new devices.

There have always been concerns that future IoT devices would enable tech companies to spy on us. The example given in the past was that motion detectors and cameras that are part of a security system could log all movements inside a home and provide a lot of detail about how various family members move during the day.

But this new technology leaps beyond that scenario to devices that actively listen and record everything we say. One would have to think this new technology is going to be built into most future smart devices as we quickly move towards a world where we talk to our house and the devices in it. All of these technologies work today by using voice recognition software in the cloud that convert everything it hears into text. From there the software in the cloud reads the text to determine if anything said warrants a response.

I’m sure that the average person hasn’t considered what this new technology means, and perhaps having this technology show up in toys will begin the conversation. The potential for abuse from this technology is almost unimaginable. One can envision family members spying upon each other. It’s not a hard stretch to foresee a repressive government listening to everything we say looking for ‘bad’ thoughts like was predicted in Fahrenheit 451 and 1984. It’s also not a hard stretch to see transcripts of what is said in a home end up on the dark web for sale so that anybody can buy our private conversations for a price. And in the business world it’s not hard to envision hacking into office devices as the ultimate form of corporate espionage – to catch those things that are said but are not put into writing.

Probably the worst thing about this technology appearing in toys is that it was put in half-baked with no real thought about security. The Electronic Privacy Information Center (EPIC) has brought a complaint about these toys to the Federal Trade Commission and asked that they be recalled, and that no future toys be allowed with the technology until there are some basic safety requirements defined for the industry. For example, EPIC showed that these toys can be easily hacked and that hackers are able to both listen to everything said within 50 feet of one of the toys, but worse, they are able to hold a conversation with kids through the toy. This opens up the scary scenario of child molesters talking directly to kids through the guise of a supposedly “safe” toy.

The company behind the technology in the toys is Nuance. Their response to the issue is not assuring. They said that they do not sell the recorded voice data to anybody. But there is no law to stop the company from changing this policy at any time. And in today’s world there can be no guarantee that the company won’t be hacked and piles of our conversations stolen by nefarious people.

This is a new technology and now is the time to craft some laws about its use. Today there are only a handful of companies deploying the technology. But now that Amazon and Google are making their AI functions available to others as a cloud-service, this technology will soon be built into huge range of devices. I know it sounds cool to change the settings on your washing machine by telling it how to wash the next load, but is it worth it if your washing machine also sends a recording of everything it hears everything to the cloud?

So we enter this Christmas season with another new technological worry. For the first time it might really be true that Santa is actually listening and he really will know if you’ve beene naughty or nice.