Speech recognition may seem like a technology of the recent past. However, this wonderful invention actually has its roots back in the 1950’s.
In the over 60 years since then, it has progressed into a tool many of us use to perform day to day activities.
1950’s & 1960’s
Named “Audrey”, Bell Laboratories’ first speech recognition system was so simple it could only understand numbers.
It took a full decade before these became words. IBM’s “Shoebox” was able to interpret 16 English words in 1962 as demonstrated at the World’s Fair.
This decade led to some of the most impressive advancements in the technology.
The U.S. Department of Defense gave funding to the DARPA speech recognition program for a period of five years. This project eventually led to “Harpy”, Carnegie Mellon’s program that could understand a staggering 1,011 words.
It was the first of its kind to marry the advancement in search technology with the interpretation of human speech patterns.
Another advancement included Bell Laboratories’ system that could understand more than one voice.
The hidden Markov model made the progression during the 80s possible.Instead of relying on templates and searching for sound patterns, the newer programs began predicting language by taking unknown sounds and determining the probability of that sound being a word.
Now spreading to homes and offices, it would be considered rudimentary by today’s standards because you had to pause after every word you spoke for it to work.
Fast processors meant language recognition was finally ready for main stream usage.
Dragon released their first product, Dragon Dictate, at the turn of the century and followed it up with Dragon NaturallySpeaking seven years later.
BellSouth introduced VAL. It was the first voice portal, the forerunner to why you now have to scream into a phone for thirty minutes to check your bank statement.
At the start of the new century, voice recognition had hit its plateau. With no current way to improve assessing and predicting speech, it had stalled out.
This changed with Google Voice Search. Not only did people inherently love speaking into their cell phones, it took the speech, processed it in a cloud data center and was able to come up with far more accurate readings merely due to the fact that the data centers could hold so many nuances associated with normal language progression.
2010 then saw Google’s personalized recognition. This meant that it could now record the voice searches to produce a far more accurate speech model.
Currently, 230 billion words can be pulled from thanks to these advancements.
While mobile phones have done their fair share to usher in the age of voice recognition, there are still kinks to work out.
Individualized voice recognition alone is a feature that is still relatively unfinished. Accents and dialects are vast and varied, making it a continual challenge to perfect the technology.
Even so, with how well it does work now, it’s not surprising futurists are predicting voice activated homes.