The Google Voice Recognition Application Programming Interface
Earlier in March of this year, search engine giant Google announced that they were making their voice recognition technology accessible to third party software developers through their Google Voice Recognition Application Programming Interface (GVR API).
Software developers will be able to provide speech support access in their applications to Google’s technology through a set of cloud-based interfaces. We must ask, what does Google speech recognition API mean for the industry?
Key Technology Differences
The GVR API is implemented as a cloud-based technology. The biggest objections stem from requiring that the translation source be connected to the web. This has the potential to introduce processing latency and increases the sense of a loss of privacy and vulnerability to unauthorized access of their data that must now reside on web-based servers.
By far the GVR API excels in many areas including:
- Support for more than 80 languages
- Superior performance in noisy situations
- Real time translations
- Machine learning algorithms that continuously improve recognition with use
- Outperforms its competitors in speech recognition accuracy
To support its acceptance, the Google voice recognition API is being released free of charge for the time being. Google has stated that they intend to implement a scalable pricing model in the future, though.
vs. Nuance Dragon Speech
Nuance’s Dragon Speech Recognition (NDSR) boasts a slightly higher accuracy level at 99 percent accuracy than GVR’s 92 percent.
NDSR has established web support for the Internet Explorer versions 6 through 9 and Mozilla Firefox versions 2.x up through 3.6.3. By contrast, Google’s web functionality is less mature.
In terms of API implementation, both are cloud-based, though NDSR’s Emerald developer package can handle 160 languages, more than double GVR’s.
In terms of pricing, GVR wins hands down, being offered free of charge. Nuance pricing starts in the $69 range for simple applications and steadily increases in proportion to package features.
Dragon Speech has been a recognized leader in voice recognition since the 1990’s. However, Google is well poised to dominate the market given the popularity of the Google Now personal assistant in the Android market.
Apple Siri is a voice recognition interface limited to iOS devices. Google can run across platforms including Apple’s computers and iOS-based phones and tablets.
Apple has only ”>recently begun to bring in software developers to add machine learning support for their speech recognition. Google has the advantage of their machine learning technology, which is implemented by extremely sophisticated neural networks, already present in their released product.
Google also has a higher number of supported languages. Siri’s speech recognition accuracy rate is between 84 and 86 percent, below GVR’s 92 percent.
vs. Microsoft Cortana
Microsoft Cortana is a speech recognition technology that ships with Windows 8 and Windows 10 operating systems for computers and hand held devices. Cortana is also available as an API that Microsoft makes available to software developers.
Cortana’s language support is the most limited of all the speech recognition API’s mentioned. Support is only offered for 11 languages and 3 regions, significantly lower than the 80 languages supported by Google’s API.
Cortana API is supported on iOS and Android, though the speech recognition has been criticized for performance not on par with Google’s or their other competitors. This will be an impediment to Cortana’s acceptance and popularity.
Google’s Speech Recognition API’s public release will strengthen its market presence in the speech recognition arena. They have recently been recognized for speech recognition in the success of their Google Now personal assistant tool in the Android device market.
Google’s API release created an industry shock that resulted in a drop in Nuance’s stock price. Google is definitely a major player in the speech recognition market and their API will open them up to more domestic and foreign speech recognition markets.