While Amazon is challenging Skill’s developers to come up with Skills that can hold a conversation for 20+ minutes, we should more be looking at how we can reduce the amount of effort required to conduct voice interactions. This is not to diminish the challenge — in the end, it will help make our AIs more friendly, but the goal shouldn’t be to see how long we can keep people on the line (as though we’re trying to trace a phone call origin from a 90’s action movie).
The key to shorter turns to get to the best request is to make assumptions. Let’s take a look at the request “Alexa, order me a pizza”. The response shouldn’t be “what type of pizza would you like to order” followed by an IVR-style question tree. The proper interaction would be to ask something along the lines of “Shall I re-order your last choice?” or “Can I order a small veggie lover’s pizza again?” or rather, just ordering it and confirming “OK — I ordered a veggie lover’s pizza”. If it’s the first time, you can add more information like “if you’d like to cancel the order, just check your pizza restaurant app…”
Assumptions are OK for ambient voice interactions. If there’s not enough information, then you can look at how you’d treat a first time inquiry vs a previous inquiry. Providing examples is one way of helping guide first time inquiries but gets annoying after the second or third time.
Lastly, a faster turn can also be achieved by matching the speed and phrasing of a user. Two examples of this are:
- Inserting a word that the user used in the request in the response. User: “What’s the best way to eat a cat”. Response: “The *best way* is to not eat a cat at all.”
- Matching the verbosity of the user. User: “What’s the weather going to be like tomorrow?” “Tomorrow, it’ll be mostly cloudy with a high of 12 and a low of 3” vs User: “Tomorrow’s weather?” Response: “Cloudy, 12 degrees”
When there’s less discord in voice interaction, we’ll get faster adoption of voice and a better user experience.