Knowing who your users are and where they will be using your voice assistant is key to designing a viable voice user interface—and surprisingly one of the elements often skipped or cut short by developers. Here are 6 tips and best practices to help create clean signals and improve the accuracy of your voice user interface.
1. Know your user’s environment
When designing a speech system—such as a smart speaker or a voice activated toy—don’t underestimate the effect of the distance between the user and the microphone.
When the user’s voice reaches the microphone, it won’t be the only sound received. Due to reflections caused by sound bouncing off obstacles or surfaces in the user’s environment, voice commands will be recorded along with these reflections. As a result, the speech recognition system will receive an echoey signal—making it difficult to process.
As the user moves away from the microphone, the energy in the direct path decreases and the signal is harder to recognize. Speaking louder doesn’t help, as it also causes the sound reflections to be stronger—further masking the user’s intent.
2. Choose the right microphone
To reduce adverse effects of environmental noise, begin by choosing the right microphone. Most importantly, you’ll want to select a microphone that has good directivity towards the speaker. If the microphone is pointed toward the speaker, noise sources and reverberation coming from other angles towards the microphone will be lessened.
There are some traditional analog microphone capsules that have really good directivity. In addition, Micro electromechanical devices (MED) are commonly used in smartphones, laptops, and similar devices. These microphones are produced as part of a silicon chip. They’re very small, lightweight, and quite inexpensive.
MEDs are omni-directional, which means that sound can hit them from any angle and they can perceive it. While that doesn’t sound like a good idea, using several of these microphones to form an array will allow you to focus on a single direction and reduce noise coming from all the others.
3. Choose linear noise reduction components
If you want to apply noise reduction, make sure it’s a linear component. Nonlinear noise reduction systems can undermine speech recognition systems, making it even harder to process the speech signal.
Traditionally, noise reduction algorithms were built for human perception. Because the speech recognition system is not the same as the human auditory system, these algorithms aren’t always good for voice assistant development.