How to get the most out of Polly, Leveraging Lexicons and SSML - March 2017 AWS Online Tech Talks

39
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Marco Nicolis, Remus Mois Amazon Text-to-Speech 03/27/2017 How to get the most out of Polly Leveraging lexicons and SSML

Transcript of How to get the most out of Polly, Leveraging Lexicons and SSML - March 2017 AWS Online Tech Talks

Marco Nicolis, Remus MoisAmazon Text-to-Speech03/27/2017How to get the most out of PollyLeveraging lexicons and SSML

2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

What is Polly?Example appUsing punctuation and SSMLUsing external Lexicons Q&A

What to Expect from the Session

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

A service that converts text into lifelike speech47 voices, 24 languagesDevelopers can store, replay and distribute generated speech

What is Polly?

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The Polly console

I bought 2lbs of meat and 16oz of potatoes

Justin (US) Amy (UK) Raveena (IN)

Amazon Text-to-Speech

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Text-to-Speech PipelineText

Text normalization

Grapheme-to-phoneme conversion

Waveform generation

SpeechShe has $20 in her pocket.

she has twenty dollars in her pocket

i h z t w n . t i d. z n h p. k t

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Goal: Convert text into intelligible, accurate, and natural speech

G2P: rough, though, through.Homographs: same spelling, different pronunciations. I live in Poland This presentation is broadcasted live from Poland

Context helps 'live' disambiguation. But...

I read this book.

Main Challenges for Text-to-Speech

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Text normalization: disambiguation of abbreviations, acronyms, units St. expanded as street or saint

St. Patrick St.

Foreign words (dj vu), proper names (Franois Hollande), social media lingo (ASAP, LOL) etc.

Main challenges for Text-to-Speech

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Speech Synthesis Markup Language(SSML)

W3C recommendation,XML-basedmarkup languageforspeech synthesisapplications. AWS Polly tags are compliant with SSML 1.1 specifications.

Allows customers to modify certain aspects of the TTS speech output, for example pronunciation of words, expansion of abbreviation, acronyms, etc., as well as pitch, rate of speech, volume, etc.SSML in Polly

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

All SSML documents must start with an opening tag and end with a closing tag. All other tags are inserted between

SSML document structure

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Example app

Changing pronunciations in Polly

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The tagIn-line aliasingIn many cases we do not want to change all instances of a certain word.

My favorite chemical element is Al,but Al prefers Mg.

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The tag Force pronunciation in-line Read: present or past?

I read a book.

I read a book.

Examples of EN phonemes

http://docs.aws.amazon.com/polly/latest/dg/supported-ssml.htmlIPAX-SAMPAExampler\redEdressiifleecedddig

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Using Lexicons in Polly

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Alias (e.g. abbreviation expansion)

Follows the Pronunciation Lexicon Specifications (PLS)

NeNeonNaSodiumMgMagnesiumAlAluminumSiSilicon

Mg and Al are chemical elements

Lexicons:

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Assign custom pronunciation (IPA or X-Sampa alphabets)

Settling the 'gif' issue once and for all.

gif"dZIfDavid"dA.%vid

I like this gif.

Here's my friend David.

Lexicons:

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Handling foreign languages

The tagForeign words and phrases

Foreign phrases are rendered better if they are enclosed inside the tag, as in the following example.

French in English

J'adore chanter.

J'adore chanter.

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The tagEnglish in Italian

The pronunciation of English is like that of a non-bilingual Italian speaker.

Mi piace Bruce Springsteen.

Mi piace Bruce Springsteen.

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The tagMultiple languages

All languages supported by AWS Polly can be invoked by the lang tag.

EN FR IT ES PLOnion, onion, cipolla, cebolla, cebula.

Onion, onion, cipolla, cebolla, cebula.

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Define a specific interpretation

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The tagThe TTS engine works well for most common and unambiguous text structures, such as dates, time, etc..Possible to force interpretation through the tag in ambiguous cases. (phone number, addresses, etc.)

Phone numbers (interpret-as="telephone")(514) 888-5195(514) 888-5195

(514) 888-5195x123 (514) 888-5195x123

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The tagPhone numbers (US vs. UK): different pronunciation styles.

USRichard's number is (212) 224-1555

UKRichard's number is (212) 224-1555

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Bleeping undesirable content

Your next song is "Killing in the name of" by Rage Against the Machine.

Your next song is "Killing in the name of" by Rage Against the Machine.

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Read character by characterAnd here is how you spell handkerchief: handkerchief.

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Modify speech delivery

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The power of commas / periodsAdding punctuation helps getting better prosody

He went to Harvard and when he decided to drop out it was not to find enlightenment with an Indian guru but to start a computer software company.

He went to Harvard, and when he decided to drop out, it was not to find enlightenment with an Indian guru, but to start a computer software company.

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The tagThe tag allows some changes to how speech is delivered, through the following supported attributes

volumeratepitch

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The volume attributeModify the volume of speech

I can speak normally, or I can speak louder.

I can speak normally, or I can speak quieter.

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The rate attributeChange the speed of speech

When I wake up, I speak quite slowly.

When I am in a hurry, I speak very fast.

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The pitch attributeModify the pitch of a word/phrase

When I get angry, my pitch goes way up

When I get sad, my pitch goes way down

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

The pitch attributeModify the pitch of a word/phrase

I can go normal, high,higher,low, and lower.

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Use pitch to improve intonationAdding punctuation and modifying pitch helps getting better prosody

Do you like this or that?

Do you like this , or that?

/var/folders/zw/3g_c67zj6n92n4_n4v81c9qjzhdx9h/T/com.apple.Preview/com.apple.Preview.PasteboardItems/AWS Speaker Training - March 2017 EMEA (dragged).pdf2017, Amazon Web Services, Inc. or its affiliates. All rights reserved

Punctuation and the tagAdd a pause anywhere (time, strength attributes)

And the winner is Bob Dylan!

And the winner is Bob Dylan!

Fun with SSML

Fun with SSML'Can you make your voices sound like an auctioneer?'

Im at 500 and I want 550550 bid on 550 Im at 500 would you go 550 550 for the gentleman in the corner A big black bug bit a big black bear a big black bug bit a big black bear Do we get 600? A big black bug bit a big black bearWe got 600 for the whole herdSold for 600.

Fun with SSML'It's good, but can you make her sound like she's from Boston???'If your cars blinkers are broken, it may be the blinker relay. Fortunately, this car fix is easy to do.

If your car's blinkers are broken, it may be the blinker relay. Fortunately, this car fix is easy to do.

Contact us with any question about this webinar or Polly in general [email protected] documentationhttp://docs.aws.amazon.com/polly/latest/dg/supported-ssml.htmlIntroducing Amazon Polly at re:Ivent 2016https://www.youtube.com/watch?v=zjMqimHis3U&t=2sPLS 1.0 Specificationshttps://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/

Next AWS Polly webinar (Apr 10th): "How to integrate Amazon Polly voices seamlessly into your application workflow"

null2703.7542null3212.788null3171.4001null1828.5708null1104.1631null3144.7883null751.1836null717.306null1853.0334null2302.5298null940.40796null924.1631null1001.8229null1306.1221null1044.8977null871.9182null1079.0201null1724.081null1515.1016null3214.4478null3561.577null6302.756null4948.873null4504.5444null4597.0615null5142.8057null3220.8152null3291.1553null5193.146null6645.2065null7363.8755null2623.727null2442.7747null2362.7747null1577.2241null2042.4756null2351.0195null2947.101null1131.9725null1422.6118null6265.858null1696.5437null10977.411null5495.759null5795.6245