AWS AI and Speech Wishlist

If I could come up with a wish list for AI services that would be available through AWS (or Google App Engine or similar), it’d be these…

The simple:

  • Language Detection. Which language is being spoken by the user.
  • STT. Why one isn’t available as part Lex? Beats me…
  • Gender and Age Detection. To adjust the content based on this…
  • Speech rate. How many words spoken per minute. Could be used to adjust the TTS rate.
  • Speech volume. Is the person a loud talker?
  • Prosody. Where is the person putting emphasis on words. Could be used to add prosody to a response.

The medium difficulty:

  • Sentiment. The language used — is it happy / sad?
  • Mood. Is the person happy or sad based on vocal tone.
  • Honesty. Rather, Voice Stress Analysis. How likely is the person to be telling the truth? e.g. “I’m feeling fine.”
  • Accent. Channeling Professor Higgins, can we guess where the person is from?

The super difficult:

  • Health. Is the person coming down with a cold or health condition? Is the speaker tired? Low blood sugar?
  • Singing Lyrics. One way to wreck STT today is to sing a command. What if we could still understand when someone is singing?
  • Hidden Meaning. What are some alternatives to what the person means? “You’re off your case, Chief”

With these, developers can start putting together Turing-test passing voice interactive bots.

