Speech marks without SSML example - Amazon Polly
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Speech marks without SSML example

The following example shows you what requested metadata looks like on your screen for the simple sentence: "Mary had a little lamb." For simplicity, we don't include SSML speech marks in this example.

The following Amazon CLI example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full quotation marks (") around the input text with single quotes (') for interior tags.

aws polly synthesize-speech \ --output-format json \ --voice-id Joanna \ --text 'Mary had a little lamb.' \ --speech-mark-types='["viseme", "word", "sentence"]' \ MaryLamb.txt

When you make this request, Amazon Polly returns the following in the .txt file:

{"time":0,"type":"sentence","start":0,"end":23,"value":"Mary had a little lamb."} {"time":6,"type":"word","start":0,"end":4,"value":"Mary"} {"time":6,"type":"viseme","value":"p"} {"time":73,"type":"viseme","value":"E"} {"time":180,"type":"viseme","value":"r"} {"time":292,"type":"viseme","value":"i"} {"time":373,"type":"word","start":5,"end":8,"value":"had"} {"time":373,"type":"viseme","value":"k"} {"time":460,"type":"viseme","value":"a"} {"time":521,"type":"viseme","value":"t"} {"time":604,"type":"word","start":9,"end":10,"value":"a"} {"time":604,"type":"viseme","value":"@"} {"time":643,"type":"word","start":11,"end":17,"value":"little"} {"time":643,"type":"viseme","value":"t"} {"time":739,"type":"viseme","value":"i"} {"time":769,"type":"viseme","value":"t"} {"time":799,"type":"viseme","value":"t"} {"time":882,"type":"word","start":18,"end":22,"value":"lamb"} {"time":882,"type":"viseme","value":"t"} {"time":964,"type":"viseme","value":"a"} {"time":1082,"type":"viseme","value":"p"}

In this output, each part of the text is broken out in terms of speech marks:

  • The sentence "Mary had a little lamb."

  • Each word in the text: "Mary", "had", "a", "little", and "lamb."

  • The viseme for each sound in the corresponding audio stream: "p", "E", "r", "i", and so on. For more information on visemes see Visemes and Amazon Polly.