Return to site

Reading Between the Lines

We can get a hint of what people want by observing their interaction with a system. With voice interaction, there are a few clues to how well users are able to navigate requests. Some metrics that can be used to infer interaction are:

  • Triggers that are not followed by requests
  • Repeated requests
  • Rephrased requests (same intent, different words)
  • Time between trigger and request
  • Time between requests
  • Length of request
  • Phrase density (word count divided by recording time or audio file size)

Some of these metrics can be available to Skills / Actions creators and some of them to those who implement Alexa Voice Service or Embedded Google SDK. With a proprietary voice interaction, this data can be completely captured.

More data on the interaction can be derived if we go more granular:

  • Loudness of the request (SNR or just microphone level after successful STT result)
  • Background noise during the request

If we layer on top other services, we can gain even more info:

  • Emotion detection
  • Age and gender of speaker
  • Music detection
  • Speaker recognition

If we identify these items, we can start to come up with correlations between users’ desires and what they get from the systems we build. We can then use the data to inform the systems’ responses.

All Posts

Almost done…

We just sent you an email. Please click the link in the email to confirm your subscription!