Some might say that it’s “Mission Accomplished” for far field microphone beam forming and acoustic echo cancellation. On thinking about this today, we’re definitely at the point of diminishing returns for performance. Having a mic work 3 more feet away or at 5 dB more will not lead to significantly more joy to the user.
There are still challenges left in far field voice interaction that will be cracked over the next few years: the cocktail party problem, multiple user transcription, and whisper detection, among others. There will be tens of millions of research dollars poured into these resolving these last issues. However, we’re now at a point where far field is good enough for the speech recognition engines to take over the clean up and process the audio.
As the quality of STT in all scenarios gets better, we’ll be pushed more to making the interaction itself and the delight the user gets at it worthwhile and compelling in the first place. This means new applications and services, not just new technologies.