Blog

Will Speech Recognition Mean The End of Accents

Posted by Chris Kikel

10/13/2021

On 04/20/2016

Speech recognition in devices like Apple’s Siri is marketed as the future of technology and communication. Voice-activated systems like HAL in 2001: A Space Odyssey and the operating system in Her seem to plant a seed in people’s imaginations of AI systems that recognize speech and operate as efficient personal assistants.

The Controversy

There has been a lot of controversy surrounding speech recognition, especially in regard to the dialects and accents understood by systems such as Siri. The website fusion.net published an editorial discussing this very issue of voice recognition, arguing that voice recognition systems discriminate against people with accents. The writer, Daniela Hernandez, cites several examples of people getting frustrated because their systems don’t recognize their accents. She argues that “tier 1” languages, such as English and Spanish, have a much lower error rate in speech recognition systems than “tier 2” languages– barring billions of people from using speech recognition.

Hernandez’s argument exposes the politics and economics of speech recognition technology. US English is the accent with the lowest error rate, and the US is the biggest market for smart phones with speech recognition software. English is by far the most commonly used language used on the Internet. Because English, especially US English, is the language used as the “standard” today, US English is the language that holds the power, which puts non-native US English speakers at a disadvantage.

Speech recognition technologies are meant to be bought and sold. Therefore, the technology’s engineers design the software to work for the biggest market, the language with the most wealth and power: US English.

The Effect

So, how will all this mumbo-jumbo about speech recognition and the global dominance of English affect language in the future?

Well, first off, language takes a very long time to change. Before an actual wide-spread transformation occurs, generations have to go by with very small and practically unnoticeable uses and pronunciation changes slowly permeating the habits of language speakers.

Take Shakespeare, for instance. Sometimes his “bough” rhymes with “though,” his weird is “wyrd,” and then there’s all that “hath” and “thou” stuff that makes the modern-day reader scratch their head in confusion. It took hundreds of years and hundreds of societal changes for English to change that much. Someone didn’t just wake up one day and say, “Gee, I think in order to save time and confusion I’m just always going to say ‘you’ instead of ‘thou’.”

But things did change, and most likely they are still changing. We’ve had societal changes in the last couple decades that rival the discovery of the New World. The Internet and relating technologies, including speech recognition, have connected the world, raising huge issues about language and how we should (and shouldn’t) communicate with people of other cultures.

The Globalization Of The Internet

Accents have been particularly affected by the globalization of the Internet. Even before the Internet, television diminished US regional accents quite a bit. The International Business Times explored how New Yorkers now struggle with distinguishing between Brooklyn and Bronx accents and how black people are discriminated against for “talking white,” or not having a black accent. The article also gives examples of the affect of accents in past political elections.

People who struggle with accents and speech recognition systems can easily avoid those systems and use their technology in a more manual way. Unlike television, speech recognition doesn’t subtly train the mind and tongue through repetition and natural use. Therefore, speech recognition may have an effect on accent, but it will probably fade into the background of the Internet’s larger effect on the politics of language.