I can remember the passionate argument that our team had many years ago when we were talking about the text-to-speech voice we’d use in the Ubi. We ended up paying a large license fee to get a voice that we felt conveyed a pleasant user experience rather than a mechanical sounding Android default TTS voice.
Fast forward to today and the thousands of skills that are being used on Alexa and the hundreds of Actions for Google, and we’re talking about potentially millions of daily interactions with different brands’ bots. Currently, most of these are likely using the stock TTS as a text reply to Google or Alexa.
However, since both interfaces support SSML and <audio> tags, there’s no reason why 1) these services can’t use another TTS service and send the playback as a link or 2) have pre-recorded responses to interactions by a voice actor.
The age, accent (nationality), gender, and mood of the voice response have a large impact on how a user perceives the agent. A company might want to have a single voice representing its brand or it might forego a specific representative and opt for being more effective — presenting a voice that would be most appealing to a given user.
Today, I used a self-checkout machine at Walmart and the pre-recorded voice was clear and positive. It’s not likely that the voice actress on these machines would be the one used for Walmart call centers or commercials. However, it’d be clever if a company like Ikea (which in Canada has a radio spokesperson with a Swedish accent) used the same actor in their self-checkout machines. It would delight shoppers. What if it could do the same in an Alexa Skill?