Long-form voices - Amazon Polly
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Long-form voices

Amazon Polly has a Long-form engine that produces human-like, highly expressive, and emotionally adept voices. Long-form voices are designed to captivate listeners’ attention for longer content, such as news articles, training materials, or marketing videos.

Amazon Polly Long-form voices are developed with a cutting-edge deep learning TTS technology. The model learns to replicate phonemes, prosody, intonation, and other phonetic and acoustic aspects of human language, resulting in a highly natural speech output.

The Long-form engine uses text embeddings to interpret the meaning of a text. Using text embeddings, the Long-form engine can generate the correct emphasis, pauses, and tone of a natural voice. The result is a voice that combines the complete range of emotional elements present in human communication. This includes mimicking surprisal or differentiating dialogue from narration. Together, this creates a premium speech product that sounds like a live human being.

Note

The state-of-the-art technology underlying these voices falls within the paradigm of generative AI for language and voice modelling. A side effect of the technology is that any updates to the training data and the model could result in a slight variations to the way the voices sound, even in case when their overall quality improves with model updates. This could have an impact on use cases with different content parts synthesized over a long time period – for example, a season of podcasts.

Available long-form voices

Amazon Polly currently offers two female and one male en-US long-form voice. These long-form voices are also available in a conversational NTTS variant.

Language Language code Name/ID Gender

1

English (US)

en-US

Danielle

Gregory

Ruth

Female

Male

Female

Feature and Region compatibility

Amazon Polly long-form voices are available in the following Region:

  • US East (N. Virginia) Region

  • Other Regions not available

The Amazon Polly Long-form engine supports the following features:

  • Real-time and asynchronous speech synthesis operations.

  • All speech marks.

  • Many (but not all) SSML tags are supported by Amazon Polly. For more information about NTTS-supported SSML tags, see Supported SSML tags

  • 100ms latency.

  • As with standard voices, you can choose from various sampling rates to optimize the bandwidth and audio quality for your application. Valid sampling rates for standard, long-form, and neural voices are: 8 kHz, 16 kHz, 22kHz, or 24 kHz. The default for standard voices is 22 kHz. The default for long-form and neural voices is 24 kHz. Amazon Polly supports MP3, OGG (Vorbis), and raw PCM audio stream formats.

Note

Long-form voices cost is specified on the Amazon Polly pricing information page.

Using the Long-form engine on the console

You can access Amazon Polly long-form voices through the Amazon Polly console or Amazon CLI.

To use the Long-form engine on the console
  1. Open the Amazon Polly console at https://console.amazonaws.cn/polly/.

  2. From the Amazon Polly console, choose the Long Form engine.

  3. Choose the desired voice from the voice dropdown menu.

  4. Generate TTS audio with text of your choice.

Note

Long-form voices can also be used with the SynthesizeSpeech and StartSpeechSynthesisTask API operations. For the API operations, customers can specify the engine and the name of the voices in the API request. You can find more quick-start code samples here.