Speech recognition platforms today suffer from poor recognition rates in a noisy environment such as kids in the back seat of cars, radio, ambulance passing by, or heavy rain. Video conferencing becomes ineffective when one of the participants is in a noisy environment like a coffee shop.
Hi Auto developed a software only speech enhancement and speaker separation system that eliminates the most challenging noises and focus only on the speaker.
Our deep learning algorithm leverages a speaker facing camera and a single microphone for more accurate speech recognition and clear speech enhancement.
The system works on device or through cloud API and eliminates the most challenging noise sources.
We utilize a speaker facing camera that tracks the lips movement to separate the speaker from any noise. While in many scenarios our deep learning algorithm works well with audio input only, the use of camera and audio-visual proprietary algorithms enables removal of noises that isn’t possible using audio only methods.