How do we get along better with the devices we speak to?
One of the terms that came up often last week at the Conversational Interactions conference was prosody — the able to add inflections and rhythms to voice.
Newer Text-To-Speech (TTS) services are adding this to their services. For Chinese language TTS’s, it’s a requirement. This is why even earlier Chinese TTS services seem more realistic. For English TTS services, prosody had been neglected for some time. The voices were flat and often robotic sounding.
We’re finally starting to hear TTS’s that are more realistic, including those based on Wavenet.
At least for voice-only interaction, there are a couple of cheats that we can use to make our devices more relatable:
Until prosody and emotion are readily available in TTS, these fixes can mimic this. Other possibilities are to use color to express different emotions, such as red/orange/yellow for angry, blue for calm, and purple for liking or agreement.
These small tweaks can affect a deeper engagement with users.
We just sent you an email. Please click the link in the email to confirm your subscription!