AWS Machine Learning Blog

Create audio for content in multiple languages with the same TTS voice persona in Amazon Polly

Amazon Polly is a leading cloud-based service that converts text into lifelike speech. Following the adoption of Neural Text-to-Speech (NTTS), we have continuously expanded our portfolio of available voices in order to provide a wide selection of distinct speakers in supported languages. Today, we are pleased to announce four new additions: Pedro speaking US Spanish, Daniel speaking German, Liam speaking Canadian French, and Arthur speaking British English. As with all the Neural voices in our portfolio, these voices offer fluent, native pronunciation in their target languages. However, what is unique about these four voices is that they are all based on the same voice persona.

Pedro, Daniel, Liam and Arthur were modeled on an existing US English Matthew voice. While customers continue to appreciate Matthew for his naturalness and professional-sounding quality, the voice has so far exclusively served English-speaking traffic. Now, using deep-learning methods, we decoupled language and speaker identity, which allowed us to preserve native-like fluency across many languages without having to obtain multilingual data from the same speaker. In practice, this means that we transferred the vocal characteristics of the US English Matthew voice to US Spanish, German, Canadian French, and British English, opening up new opportunities for Amazon Polly customers.

Having a similar-sounding voice available in five locales unlocks great potential for business growth. First of all, customers with a global footprint can create a consistent user experience across languages and regions. For example, an interactive voice response (IVR) system that supports multiple languages can now serve different customer segments without changing the feel of the brand. The same goes for all other TTS use cases, such as voicing news articles, education materials, or podcasts.

Secondly, the voices are a good fit for Amazon Polly customers who are looking for a native pronunciation of foreign phrases in any of the five supported languages.

Thirdly, releasing Pedro, Daniel, Liam, and Arthur serves our customers who like Amazon Polly NTTS in US Spanish, German, Canadian French, and British English but are looking for a high-quality masculine voice—they can use these voices to create audio for monolingual content and expect top quality that is on par with other NTTS voices in these languages.

Lastly, the technology we have developed to create the new male NTTS voices can also be used for Brand Voices. Thanks to this, Brand Voice customers can not only enjoy a unique NTTS voice that is tailored to their brand, but also keep a consistent experience while serving an international audience.

Example use case

Let’s explore an example use case to demonstrate what this means in practice. Amazon Polly customers familiar with Matthew can still use this voice in the usual way by choosing Matthew on the Amazon Polly console and entering any text they want to hear spoken in US English. In the following scenario, we generate audio samples for an IVR system (“For English, please press one”):

Thanks to this release, you can now expand the use case to deliver a consistent audio experience in different languages. All the new voices are natural-sounding and maintain a native-like accent.

  • To generate speech in British English, choose Arthur (“For English, please press one”):
  • To use a US Spanish speaker, choose Pedro (“Para español, por favor marque dos”):
  • Daniel offers support in German (“Für Deutsch drücken Sie bitte die Drei”):
  • You can synthesize text in Canadian French by choosing Liam (“Pour le français, veuillez appuyer sur le quatre”):

Note that apart from speaking with a different accent, the UK English Arthur voice will localize the input text differently than the US English Matthew voice. For example, “1/2/22” will be read by Arthur as “the 1st of February 2022,” whereas Matthew will read it as “January 2nd 2022.”

Now let’s combine these prompts:

Conclusion

Pedro, Daniel, Liam, and Arthur are available as Neural TTS voices only, so in order to enjoy them, you need to use the Neural engine in one of the AWS Regions supporting NTTS. These are high-quality monolingual voices in their target languages. The fact that their personas are consistent across languages is an additional benefit, which we hope will delight customers working with content in multiple languages. For more details, review our full list of Amazon Polly text-to-speech voices , Neural TTS pricing, service limits, and FAQs, and visit our pricing page.


About the Authors

Patryk Wainaina is a Language Engineer working on text-to-speech for English, German, and Spanish. With a background in speech and language processing, his interests lie in machine learning as applied to TTS front-end solutions, particularly in low-resource settings. In his free time, he enjoys listening to electronic music and learning new languages.

Marta Smolarek is a Senior Program Manager in the Amazon Text-to-Speech team, where she is focused on the Contact Center TTS use case. She defines Go-to-Market initiatives, uses customer feedback to build the product roadmap and coordinates TTS voice launches. Outside of work, she loves to go camping with her family.