Google has been quietly working on speech recognition as one of its three major forays into artificial intelligence. Few people know that their Android phone or even the iPhone has an application that can be used to impart full speech recognition into the phone thanks to the Google cloud.
Most have probably tried or seen speech recognition software, which, for the most part, has been stuck in the 1980s with clunky and often mistake-prone engines and hit-and-miss interfaces. The usefulness of speech recognition has been, frankly, nonexistent.
Until now, anyway.
=== Why voice recognition hasn’t worked well so far.
The difference between yesterday’s SR software and that of today is the approach taken to create the recognition itself. Until recently, the field of SR has been dominated by linguists who’ve used the idea that computers can recognize speech if they have enough data about the individual sounds that make up the phonetics of our languages.
The trouble is, these individual sounds have to be mapped by humans and coded for the computer to understand. Any gaps in that mapping (such as the sound difference between an ‘ahh’ and an ‘oahh’ sound) means gaps in recognition. In addition, the more mapping that is done, the larger the software gets – thus it becomes unwieldy fast.
=== Enter the new approach, championed by Google.
The opposite approach, taken by engineers, is to crunch vast amounts of voice data and let the computer index and map those individual sounds. This requires a huge up-front investment in computing time, but once accomplished, means much more seamless voice recognition.
Google, which has more data than anybody, has the resources to accomplish this. Voice recordings from the old Google411 and other systems the company has had in the past provide the base data to begin with. Using this, engineers have been able to parse a gigantic database of language sounds for several languages around the world, but mostly English.
The result? Android Voice and Translate apps that actually work.
You can literally hold your phone at the table in a French restaurant and speak to the waiter, through the phone, without having to know French. You say your phrase, the phone translates it (thanks to the Google cloud) into French for the waiter, and the waiter’s response is likewise given to you in English. Very Stark Trek.
The phone can also take speech queries and turn them into answers or search results. Asking it, in plain English, to calculate simple or even complex equations is as easy as stating the equation. Even asking for directions and getting a Google Maps direction set is as easy as stating an address and letting it use your current GPS location as a starting point.
While the speech recognition isn’t always perfect, especially in languages other than English, the app does do most things well. It’s a first step towards a fully-integrated future where we talk to our computers and each other (through them) without inhibition.
Yet again, reality gets one step closer to Star Trek.