Today, as the physical and digital world unites, people want to use voice to interact and control computers, smartphones and other “wiser” devices. We also studied voice assistants that are used in personal gadgets or smart speakers, but what other areas of application of voice recognition technologies can we learn?
Any washing machine or kettle is equipped with a touch screen. But it’s more convenient to use your own voice than to press buttons. In a minute, we can print up to 40 words, but say up to 150. The speed of interaction with automated technology changes. And business has joined in this paradigm shift – voice control will soon become a mandatory condition for it.
The sector of speech technologies is recognized as one of the most dynamically developing in the world. According to the MarketsandMarkets report, the world market of speech technologies will grow from 3.7 billion dollars today to 12 billion by 2022. The main driver of growth is the demand for voice authentication in financial institutions, healthcare enterprises and government organizations. The share of speech technologies will also grow in telecoms, call centers and B2C sector. The most widespread use of voice technology has so far received in Asia: for example, in Japan, all call centers have been automated.
The huge attention of IT giants to speech technologies is understandable: they foresee or create a technological evolution, keeping the user in their ecosystem by constantly adding new opportunities. But other companies and industries began to actively introduce voice technologies, because in human speech, besides naturalness, there were many other features, and the speed of voice as an interface is several times faster than other methods.
Speech recognition technologies exist since the middle of the 60s of the last century. However, only a few years ago, the computer conversion of speech into text and audio responses to users were fully put on a commercial stream. The spurt in the development of speech technologies was due to the fact that the cost of computing resources has fallen dramatically over the last few years, it became economically profitable to create large neural networks and to process data sets for solving various problems with them. According to Techcrunch, the breakthrough in voice technology over the past 18 months is much more significant than in the past 15 years. Now we can safely talk about the existence of the market for automatic speech processing. It develops both B2C-technologies of virtual assistants and B2B-solutions of speech recognition.
Speech recognition technology is also used today in the automotive industry. The most primitive is already familiar to us navigators. Today’s technologies are voice control of various functions of the car, and this is available not only in luxury cars. Great success is made by Ford, equipping its machines with the capabilities for voice control of navigation or multimedia systems. Technologies of the near future are unmanned vehicles, which can be controlled by setting a route both by computer and by voice. Autonomous cars Google, electric car Tesla, robot cars MIG (Made in Germany), AKTIV, VisLab, car from Braunschweig, named Leonie – all of them involve the use of artificial intelligence and voice control. Yandex and Kamaz reported about the beginning of development of an unmanned truck. The first model will be released in 2018 and will be equipped with artificial intelligence from Yandex.
In Factories and Plants
In early 2016, the resident of Skolkovo, the company “MDG-Innovations” presented the technology of automatic recognition of speech commands, programmed to work with industrial robots. The technology is based on acoustic models created with the help of deep neural networks (DNN), which makes the program more accurate and reliable. The new development allows you to turn the machine on and off, change the mode of operation of ventilation in the room, and manage the equipment at the construction site. Technology highlights speech on the background of strong production noise, adapts to specific people in production, adjusting to their speech characteristics and much more. According to the forecasts of the creators, the program will increase the productivity of workers, while reducing injuries at enterprises.
In the Banking Sector
In February this year HSBC Bank in the UK offered 15 million of its customers voice identification for access to online banking services. And the technology recognizes the client, even if it is cold or hoarse, analyzing up to 100 parameters of voice determination: patterns in pronunciation, modulation, sounds that reflect the volume and shape of the pharynx, nasal cavity, and the vocal tract. In 2013, Barclays proposed a similar function for 300,000 wealthiest customers who were delighted, because identification time had decreased from 1.5 minutes to 10 seconds.
In addition to the speed and convenience for the client, who now does not need to remember the codeword and other passwords, switching to biometric authentication increases the security of the bank account.
In Telecom and Marketing
Yandex conducts its research and development in the field of speech recognition systems. Today, the speech recognition system Yandex SpeechKit has found its application immediately in two sectors of the economy – telecom and marketing. Megafon chose Yandex.SpeechKit for the virtual consultant “Elena 2.0”. Its main task is to ease the load on the operator’s call centers. At the moment, “Elena 2.0” helps Megafon subscribers to check the billing, informs about connected tariffs and services, can connect or disconnect the service or transfer money from one number to another. Of course, now the virtual young lady can not completely replace the employee of the call center, but as far as machine learning, “Elena 2.0” can answer hundreds of millions of calls a year.
The Future of Speech Technologies
There are new developments in the field of security. AmberBox is an American startup, a participant of the Y Combinator S16, the author of the gadget of the same name with automatic detection of the sound of the weapon’s discharge. The development of the detector was caused by a wave of outbreaks of violence and executions in the United States. The detector AmerBox uses a combined algorithm of sound response and infrared detection, through which the device is able to accurately separate the sound of the shot from other noises – and in case of detection, give a signal to the guard, warn nearby people and initiate an evacuation program.This makes it possible to reduce the response time of the police up to 63% and, as a result, save people’s lives.
Another hot startup that Google and Apple support at their conferences is AVA, the developer of a mobile application from San Francisco that allows people to see what they say around them in less than a second. The program is designed for hearing impaired people and deaf people. Voice technology in general will greatly help people with disabilities in interacting with the outside world, computers and smart devices. For example, one of the first was the SayShopping application for the iPhone, which allowed blind or visually impaired people to buy goods from an online hypermarket only through voice.
Almost all start-ups have ambitious goals. For example, ObEN, founded in 2014 in California, calls its mission to collect the world’s largest bank of speech patterns and voices and become a resource for the development of robotics, the gaming industry, entertainment, education, health and music.
But even if you do not think about all start-ups in the San Francisco suburbs, and just look around, sitting in your office or apartment, we’ll understand that every element of the interior, office and home appliances will soon understand our voice and follow the appropriate directions. And this is a huge market for every manufacturer.