Speech Synthesis Markup Language (SSML) Reference
When the service for your skill returns a response to a user’s request, you provide text that the Alexa service converts to speech. Alexa automatically handles normal punctuation, such as pausing after a period, or speaking a sentence ending in a question mark as a question.
However, in some cases you may want additional control over how Alexa generates the speech from the text in your response. For example, you may want a longer pause within the speech, or you may want a string of digits read back as a standard telephone number. The Alexa Skills Kit provides this type of control with Speech Synthesis Markup Language (SSML) support.
SSML is a markup language that provides a standard way to mark up text for the generation of synthetic speech. The Alexa Skills Kit supports a subset of the tags defined in the SSML specification. The specific tags supported are listed in Supported SSML Tags.
- Using SSML in Your Response
- Supported SSML Tags
- amazon:effect
- audio
- break
- emphasis
- p
- phoneme
- prosody
- s
- say-as
- speak
- sub
- w
- Other SSML Reference Materials
Using SSML in Your Response
To use SSML, construct your output speech using the supported SSML tags. When sending back a response from your service, you must indicate that it is using SSML rather than plain text:
- When using the Java library, use the
SsmlOutputSpeechclass. Call thesetSsml()method and pass in the output speech marked up with the tags. - When not using the Java library, provide the marked-up text in the
outputSpeechproperty, but set thetypetoSSMLinstead ofPlainText. Use thessmlproperty instead oftextfor the marked-up text:
"outputSpeech": {
"type": "SSML",
"ssml": "<speak>This output speech uses SSML.</speak>"
}
- You can use SSML with both the normal output speech response and any re-prompt included in the response.
The SSML you provide must be wrapped within <speak> tags. For example:
<speak>
Here is a number <w role="amazon:VBD">read</w>
as a cardinal number:
<say-as interpret-as="cardinal">12345</say-as>.
Here is a word spelled out:
<say-as interpret-as="spell-out">hello</say-as>.
</speak>
Supported SSML Tags
The Alexa Skills Kit supports the following SSML tags (listed in alphabetic order):
The remaining sections describe each of these tags.
Note that the Alexa service strips out any unsupported SSML tags included in the text you provide.
amazon:effect
Applies Amazon-specific effects to the speech.
| Attribute | Possible Values |
|---|---|
|
|
The name of the effect to apply to the speech. Available effects:
|
<speak>
I want to tell you a secret.
<amazon:effect name="whispered">I am not a real human.</amazon:effect>.
Can you believe it?
</speak>
audio
The audio tag lets you provide the URL for an MP3 file that the Alexa service can play while rendering a response. You can use this to embed short, pre-recorded audio within your service’s response. For example, you could include sound effects alongside your text-to-speech responses, or provide responses using a voice associated with your brand. For more information, see Including Short Pre-Recorded Audio in your Response.
| Attribute | Possible Values |
|---|---|
|
|
Specifies the URL for the MP3 file. Note the following requirements and limitations:
You may need to use converter software to convert your MP3 files to the required codec version (MPEG version 2) and bit rate (48 kbps). |
Include the audio tag within your text-to-speech response within the speak tag. Alexa plays the MP3 at the specified point within the text to speech. For example:
<speak>
Welcome to Car-Fu.
<audio src="https://carfu.com/audio/carfu-welcome.mp3" />
You can order a ride, or request a fare estimate.
Which will it be?
</speak>
When Alexa renders this response, it would sound like this:
Alexa: Welcome to Car-Fu.
(the specified carfu-welcome.mp3 audio file plays)
Alexa: You can order a ride, or request a fare estimate. Which will it be?
A single response sent by your service can include multiple audio tags according to the following limits:
- No more than five audio files can be used in a single response.
- The combined total time for all audio files in a single response cannot be more than ninety (90) seconds.
Converting Audio Files to an Alexa-Friendly Format
You may need to use converter software to convert your MP3 files to the required codec version (MPEG version 2) and bit rate (48 kbps). One option for this is a command-line tool, FFmpeg. The following command converts the provided <input-file> to an MP3 file that works with the audio tag.
ffmpeg -i <input-file> -ac 2 -codec:a libmp3lame -b:a 48k -ar 16000 <output-file.mp3>
Another option is Audacity:
- Open the file to convert.
- Set the Project Rate in the lower-left corner to
16000. - Click File > Export Audio and change the Save as type to
MP3 Files. - Click Options, set the Quality to
48 kbpsand the Bit Rate Mode toConstant.
This requires the Lame library, which can be found at: http://lame.buanzo.org/#lamewindl.
Hosting the Audio Files for Your Skill
The MP3 files you use to provide audio must be hosted on an endpoint that uses HTTPS. The endpoint must provide an SSL certificate signed by an Amazon-approved certificate authority. Many content hosting services provide this. For example, you could host your files at a service such as Amazon Simple Storage Service (Amazon S3) (an Amazon Web Services offering).
We don’t require that you authenticate the requests for the audio files. Therefore, you must not include any customer-specific or sensitive information in these audio files. For example, building a custom MP3 file in response to a user’s request, and including sensitive information within the audio, is not allowed.
break
Represents a pause in the speech. Set the length of the pause with the strength or time attributes.
| Attribute | Possible Values |
|---|---|
|
|
|
|
|
Duration of the pause; up to 10 seconds ( |
The default is medium. This is used if you don’t specify any attributes, or if you provide any unsupported attribute values.
<speak>
There is a three second pause here <break time="3s"/>
then the speech continues.
</speak>
emphasis
Emphasize the tagged words or phrases. Emphasis changes rate and volume of the speech. More emphasis is spoken louder and slower. Less emphasis is quieter and faster.
| Attribute | Possible Values |
|---|---|
|
|
|
<speak>
I already told you I
<emphasis level="strong">really like</emphasis>
that person.
</speak>
p
Represents a paragraph. This tag provides extra-strong breaks before and after the tag. This is equivalent to specifying a pause with <break strength="x-strong"/>.
<speak>
<p>This is the first paragraph. There should be a pause after this text is spoken.</p>
<p>This is the second paragraph.</p>
</speak>
phoneme
Provides a phonemic/phonetic pronunciation for the contained text. For example, people may pronounce words like “pecan” differently.
| Attribute | Possible Values |
|---|---|
|
|
Set to the phonetic alphabet to use:
|
|
|
The phonetic pronunciation to speak. See below for a list of supported symbols in each of the supported skill languages. |
When using this tag, Alexa uses the pronunciation provided in the ph attribute rather than the text contained within the tag. However, you should still provide human-readable text within the tags. In the following example, the word “pecan” shown within the tags is never spoken. Instead, Alexa speaks the text provided in the ph attribute:
<speak>
You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>.
I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
</speak>
Additional examples of writing words with a phonetic alphabet:
| Word | IPA | X-SAMPA |
|---|---|---|
| bottle | ˈbɑ.təl | "bA.t@l |
| frozen | ˈfɹoʊ.zən | "fr\oU.z@n |
| blossom | ˈblɑ.səm | "blA.s@m |
Supported Symbols
The following tables list the supported symbols for use with the phoneme tag. The symbols are specific to the skill’s language.
These symbols provide full coverage for the sounds of English (US). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (US) skills is discouraged, as it may result in suboptimal speech synthesis.
| IPA | X-SAMPA | Description | Examples |
|---|---|---|---|
| b | b | voiced bilabial plosive | bed |
| d | d | voiced alveolar plosive | dig |
| d͡ʒ | dZ | voiced postalveolar affricate | jump |
| ð | D | voiced dental fricative | then |
| f | f | voiceless labiodental fricative | five |
| g | g | voiced velar plosive | game |
| h | h | voiceless glottal fricative | house |
| j | j | palatal approximant | yes |
| k | k | voiceless velar plosive | cat |
| l | l | alveolar lateral approximant | lay |
| m | m | bilabial nasal | mouse |
| n | n | alveolar nasal | nap |
| ŋ | N | velar nasal | thing |
| p | p | voiceless bilabial plosive | speak |
| ɹ | r\ | alveolar approximant | red |
| s | s | voiceless alveolar fricative | seem |
| ʃ | S | voiceless postalveolar fricative | ship |
| t | t | voiceless alveolar plosive | trap |
| t͡ʃ | tS | voiceless postalveolar affricate | chart |
| θ | T | voiceless dental fricative | thin |
| v | v | voiced labiodental fricative | vest |
| w | w | labial-velar approximant | west |
| z | z | voiced alveolar fricative | zero |
| ʒ | Z | voiced postalveolar fricative | vision |
| IPA | X-SAMPA | Description | Examples |
|---|---|---|---|
| ə | @ | mid central vowel | arena |
| ɚ | @` | mid central r-colored vowel | reader |
| æ | { | near-open front unrounded vowel | trap |
| aɪ | aI | diphthong | price |
| aʊ | aU | diphthong | mouth |
| ɑ | A | long open back unrounded vowel | father |
| eɪ | eI | diphthong | face |
| ɝ | 3` | open-mid central unrounded r-colored vowel | nurse |
| ɛ | E | open-mid front unrounded vowel | dress |
| i | i | long close front unrounded vowel | fleece |
| ɪ | I | near-close near-front unrounded vowel | kit |
| oʊ | oU | diphthong | goat |
| ɔ | O | long open-mid back rounded vowel | thought |
| ɔɪ | OI | diphthong | choice |
| u | u | long close back rounded vowel | goose |
| ʊ | U | near-close near-back rounded vowel | foot |
| ʌ | V | open-mid back unrounded vowel | strut |
| IPA | X-SAMPA | Description | Examples |
|---|---|---|---|
| ˈ | " | primary stress | Alabama |
| ˌ | % | secondary stress | Alabama |
| . | . | syllable boundary | A.la.ba.ma |
These symbols provide full coverage for the sounds of English (UK). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (UK) skills is discouraged, as it may result in suboptimal speech synthesis.
| IPA | X-SAMPA | Description | Examples |
|---|---|---|---|
| b | b | voiced bilabial plosive | bed |
| d | d | voiced alveolar plosive | dig |
| d͡ʒ | dZ | voiced postalveolar affricate | jump |
| ð | D | voiced dental fricative | then |
| f | f | voiceless labiodental fricative | five |
| g | g | voiced velar plosive | game |
| h | h | voiceless glottal fricative | house |
| j | j | palatal approximant | yes |
| k | k | voiceless velar plosive | cat |
| l | l | alveolar lateral approximant | lay |
| m | m | bilabial nasal | mouse |
| n | n | alveolar nasal | nap |
| ŋ | N | velar nasal | thing |
| p | p | voiceless bilabial plosive | speak |
| ɹ | r\ | alveolar approximant | red |
| s | s | voiceless alveolar fricative | seem |
| ʃ | S | voiceless postalveolar fricative | ship |
| t | t | voiceless alveolar plosive | trap |
| t͡ʃ | tS | voiceless postalveolar affricate | chart |
| θ | T | voiceless dental fricative | thin |
| v | v | voiced labiodental fricative | vest |
| w | w | labial-velar approximant | west |
| z | z | voiced alveolar fricative | zero |
| ʒ | Z | voiced postalveolar fricative | vision |
| IPA | X-SAMPA | Description | Examples |
|---|---|---|---|
| ə | @ | mid central vowel | arena |
| æ | { | near-open front unrounded vowel | trap |
| aɪ | aI | diphthong | price |
| aʊ | aU | diphthong | mouth |
| ɑ | A | long open back unrounded vowel | father |
| eɪ | eI | diphthong | face |
| ɜ | 3 | open-mid central unrounded vowel | nurse |
| ɛ | E | open-mid front unrounded vowel | dress |
| i | i | long close front unrounded vowel | fleece |
| ɪ | I | near-close near-front unrounded vowel | kit |
| əʊ | @U | diphthong | goat |
| ɔ | O | long open-mid back rounded vowel | thought |
| ɔɪ | OI | diphthong | choice |
| u | u | long close back rounded vowel | goose |
| ʊ | U | near-close near-back rounded vowel | foot |
| ʌ | V | open-mid back unrounded vowel | strut |
| ɒ | Q | open back rounded vowel | bother |
| ɛə | E@ | diphthong | bear |
| ɪə | I@ | diphthong | beer |
| ʊə | U@ | diphthong | tour |
| IPA | X-SAMPA | Description | Examples |
|---|---|---|---|
| ˈ | " | primary stress | Alabama |
| ˌ | % | secondary stress | Alabama |
| . | . | syllable boundary | A.la.ba.ma |
These symbols provide full coverage for the sounds of English (India). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (India) skills is discouraged, as it may result in suboptimal speech synthesis.
| IPA | X-SAMPA | Description | Examples |
|---|---|---|---|
| b | b | voiced bilabial plosive | bed |
| d | d | voiced alveolar plosive | dig |
| d͡ʒ | dZ | voiced postalveolar affricate | jump |
| ð | D | voiced dental fricative | then |
| f | f | voiceless labiodental fricative | five |
| g | g | voiced velar plosive | game |
| h | h | voiceless glottal fricative | house |
| j | j | palatal approximant | yes |
| k | k | voiceless velar plosive | cat |
| l | l | alveolar lateral approximant | lay |
| m | m | bilabial nasal | mouse |
| n | n | alveolar nasal | nap |
| ŋ | N | velar nasal | thing |
| p | p | voiceless bilabial plosive | speak |
| ɹ | r\ | alveolar approximant | red |
| s | s | voiceless alveolar fricative | seem |
| ʃ | S | voiceless postalveolar fricative | ship |
| t | t | voiceless alveolar plosive | trap |
| t͡ʃ | tS | voiceless postalveolar affricate | chart |
| θ | T | voiceless dental fricative | thin |
| v | v | voiced labiodental fricative | vest |
| w | w | labial-velar approximant | west |
| z | z | voiced alveolar fricative | zero |
| ʒ | Z | voiced postalveolar fricative | vision |
| IPA | X-SAMPA | Description | Examples |
|---|---|---|---|
| ə | @ | mid central vowel | arena |
| æ | { | near-open front unrounded vowel | trap |
| aɪ | aI | diphthong | price |
| aʊ | aU | diphthong | mouth |
| ɑ | A | long open back unrounded vowel | father |
| eɪ | eI | diphthong | face |
| ɜ | 3 | open-mid central unrounded vowel | nurse |
| ɛ | E | open-mid front unrounded vowel | dress |
| i | i | long close front unrounded vowel | fleece |
| ɪ | I | near-close near-front unrounded vowel | kit |
| əʊ | @U | diphthong | goat |
| ɔ | O | long open-mid back rounded vowel | thought |
| ɔɪ | OI | diphthong | choice |
| u | u | long close back rounded vowel | goose |
| ʊ | U | near-close near-back rounded vowel | foot |
| ʌ | V | open-mid back unrounded vowel | strut |
| ɒ | Q | open back rounded vowel | bother |
| ɛə | E@ | diphthong | bear |
| ɪə | I@ | diphthong | beer |
| ʊə | U@ | diphthong | tour |
| IPA | X-SAMPA | Description | Examples |
|---|---|---|---|
| ˈ | " | primary stress | Alabama |
| ˌ | % | secondary stress | Alabama |
| . | . | syllable boundary | A.la.ba.ma |
| IPA | XSAMPA | Description | Examples |
|---|---|---|---|
| pʰ | p_h | voiceless aspirated bilabial plosive | फूल (phool) |
| bʱ | b_h | voiced aspirated bilabial plosive | भारी (bhaari) |
| t̪ | t_d | voiceless dental plosive | तापमान (taapmaan) |
| t̪ʰ | t_d_h | voiceless aspirated dental plosive | थोड़ा (thoda) |
| d̪ | d_d | voiced dental plosive | दिल्ली (dilli) |
| d̪ʱ | d_d_h | voiced aspirated dental plosive | धोबी (dhobi) |
| ʈ | t` | voiceless retroflex plosive | कटोरा (katora) |
| ʈʰ | t`_h | voiceless aspirated retroflex plosive | ठंड (thand) |
| ɖ | d` | voiced retroflex plosive | डर (darr) |
| ɖʱ | d`_h | voiced aspirated retroflex plosive | ढाल (dhal) |
| tʃʰ | tS_h | voiceless aspirated palatal affricate | छाल (chaal) |
| dʒʱ | dZ_h | voiced aspirated palatal affricate | झाल (jhaal) |
| kʰ | k_h | voiceless aspirated velar plosive | खान (khan) |
| ɡʱ | g_h | voiced aspirated velar plosive | घान (ghaan) |
| ɳ | n` | retroflex nasal | क्षण (kshan) |
| ɾ | 4 | alveolar flap | राम (ram) |
| ɽ | r` | plain retroflex flap | बड़ा (bada) |
| ɽʱ | r`_h | voiced aspirated retroflex flap | बढ़ी (barhi) |
| ʋ | v\ | bilabial approximant | वसूल (wasool) |
| IPA | XSAMPA | Description | Examples |
|---|---|---|---|
| ə | @_o | mid central vowel | अच्छा (achhaa) |
| ə̃ | @~ | nasalised mid central vowel | हँसना (hansnaa) |
| a | A_o | open front unrounded vowel | आग (aag) |
| ã | A~ | nasalised open front unrounded vowel | घड़ियाँ (ghariyaan) |
| ɪ | I_o | near-close near-front unrounded vowel | इक्कीस (ikkees) |
| ɪ̃ | I~ | nasalised near-close near front unrounded vowel | सिंचाई (sinchai) |
| i | i_o | close front unrounded vowel | बिल्ली (billee) |
| ĩ | i~ | nasalised close front unrounded vowel | नहीं (nahin) |
| ʊ | U_o | near-close near-back rounded vowel | उल्लू (ullu) |
| ʊ̃ | U~ | nasalised near-close near-back rounded vowel | मुँह (munh) |
| u | u_o | close back rounded vowel | फूल (phool) |
| ũ | u~ | nasalised close back rounded vowel | ऊँट (oont) |
| ɔ | O_o | open-mid back rounded vowel | कौन (kaun) |
| ɔ̃ | O~ | nasalised open-mid back rounded vowel | भौं (bhaun) |
| o | o | close-mid back rounded vowel | सोना (sona) |
| õ | o~ | nasalised close-mid back rounded vowel | क्यों (kyon) |
| ɛ | E_o | open-mid front unrounded vowel | पैसा (paisa) |
| ɛ̃ | E~ | nasalised open-mid front unrounded vowel | मैं (main) |
| e | e | close-mid front unrounded vowel | एक (ek) |
| ẽ | e~ | nasalised close-mid front unrounded vowel | किताबें (kitabein) |
These symbols provide full coverage for the sounds of German. Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for German skills is discouraged, as it may result in suboptimal speech synthesis.
| IPA | X-Sampa | Description | Examples |
|---|---|---|---|
| b | b | voiced bilabial plosive | Bier |
| d | d | voiced alveolar plosive | Dach |
| ç | C | voiceless palatal fricative | ich |
| d͡ʒ | dZ | voiced postalveolar affricate | Dschungel |
| f | f | voiceless labiodental fricative | Vogel |
| g | g | voiced velar plosive | Gabel |
| h | h | voiceless glottal fricative | Haus |
| j | j | palatal approximant | jemand |
| k | k | voiceless velar plosive | Kleid |
| l | l | alveolar lateral approximant | Loch |
| m | m | bilabial nasal | Milch |
| n | n | alveolar nasal | Natur |
| ŋ | N | velar nasal | klingen |
| p | p | voiceless bilabial plosive | Park |
| p͡f | pf | voiceless labiodental affricate | Apfel |
| ʀ | R | uvular trill | Regen |
| s | s | voiceless alveolar fricative | Messer |
| ʃ | S | voiceless postalveolar fricative | Fischer |
| t | t | voiceless alveolar plosive | Topf |
| t͡s | ts | voiceless alveolar affricate | Zahl |
| t͡ʃ | tS | voiceless postalveolar affricate | deutsch |
| v | v | voiced labiodental fricative | Wasser |
| x | x | voiceless velar fricative | kochen |
| z | z | voiced alveolar fricative | See |
| ʒ | Z | voiced postalveolar fricative | Orange |
| IPA | X-Sampa | Description | Examples |
|---|---|---|---|
| a | a | open front unrounded vowel | Salz |
| aː | a: | long open front unrounded vowel | Sahne |
| aʊ | aU | diphthong | Augen |
| ə | @ | mid central vowel | Rede |
| ɐ | 6 | near-open central vowel | besser |
| aɪ | aI | diphthong | nein |
| ɛ | E | open-mid front unrounded vowel | Kellner |
| eː | e: | long close-mid front unrounded vowel | Rede |
| øː | 2: | long close-mid front rounded vowel | böse |
| ɪ | I | near-close near-front unrounded vowel | bitte |
| iː | i: | long close front unrounded vowel | Lied |
| ɔ | O | open-mid back rounded vowel | Koffer |
| œ | 9 | open-mid front rounded vowel | können |
| oː | o: | long close-mid back rounded vowel | Kohl |
| ɔʏ | OY | diphthong | neu |
| ʊ | U | near-close near-back rounded vowel | Wunder |
| ʏ | Y | near-close near-front rounded vowel | Küche |
| uː | u: | long close back rounded vowel | Bruder |
| yː | y: | long close front rounded vowel | kühl |
| IPA | X-Sampa | Examples |
|---|---|---|
| aɐ̯ | a6_^ | hart |
| aːɐ̯ | a:6_^ | Haar |
| ɛɐ̯ | E6_^ | Berg |
| eːɐ̯ | e:6_^ | schwer |
| øːɐ̯ | 2:6_^ | Nadelöhr |
| ɪɐ̯ | I6_^ | Wirtschaft |
| iːɐ̯ | i:6_^ | Tier |
| ɔɐ̯ | O6_^ | dort |
| œɐ̯ | 96_^ | Wörter |
| oːɐ̯ | o:6_^ | Ohr |
| ʊɐ̯ | U6_^ | Gurke |
| ʏɐ̯ | Y6_^ | Türkei |
| uːɐ̯ | u:6_^ | Kur |
| yːɐ̯ | y:6_^ | Tür |
| IPA | X-Sampa | Description | Examples |
|---|---|---|---|
| ð | D | voiced dental fricative | brother |
| ɹ | r\ | alveolar approximant | ripe |
| θ | T | voiceless dental fricative | north |
| w | w | labial-velar approximant | well |
| ɔː | O: | long open-mid back rounded vowel | callcenter |
| eɪ | eI | diphthong | rating |
| oʊ | oU | diphthong | windows |
| IPA | X-Sampa | Description | Examples |
|---|---|---|---|
| ã: | a~: | nasalized long open front unrounded vowel | Croissant |
| ɛ̃ː | E~: | nasalized long open-mid front unrounded vowel | Terrain |
| õ: | o~: | nasalized long close-mid back rounded vowel | Annonce |
| IPA | X-Sampa | Description | Examples |
|---|---|---|---|
| ˈ | " | primary stress | genau |
| . | . | syllable boundary | ver.stan.den |
prosody
Modifies the volume, pitch, and rate of the tagged speech.
| Attribute | Possible Values |
|---|---|
|
|
Modify the rate of the speech:
|
|
|
Raise or lower the tone (pitch) of the speech:
|
|
|
Change the volume for the speech:
|
<speak>
Normal volume for the first sentence.
<prosody volume="x-loud">Louder volume for the second sentence</prosody>.
When I wake up, <prosody rate="x-slow">I speak quite slowly</prosody>.
I can speak with my normal pitch,
<prosody pitch="x-high"> but also with a much higher pitch </prosody>,
and also <prosody pitch="low">with a lower pitch</prosody>.
</speak>
s
Represents a sentence. This tag provides strong breaks before and after the tag.
This is equivalent to:
- Ending a sentence with a period (.).
- Specifying a pause with
<break strength="strong"/>.
<speak>
<s>This is a sentence</s>
<s>There should be a short pause before this second sentence</s>
This sentence ends with a period and should have the same pause.
</speak>
say-as
Describes how the text should be interpreted. This lets you provide additional context to the text and eliminate any ambiguity on how Alexa should render the text. Indicate how Alexa should interpret the text with the interpret-as attribute.
| Attribute | Possible Values |
|---|---|
|
|
|
|
|
Only used when
Alternatively, if you provide the date in YYYYMMDD format, the |
Note that the Alexa service attempts to interpret the provided text correctly based on the text’s formatting even without this tag. For example, if your output speech includes “202-555-1212”, Alexa speaks each individual digit, with a brief pause for each dash. You don’t need to use <say-as interpret-as="telephone"> in this case. However, if you provided the text “2025551212”, but you wanted Alexa to speak it as a phone number, you would need to use <say-as interpret-as="telephone">.
<speak>
Here is a number spoken as a cardinal number:
<say-as interpret-as="cardinal">12345</say-as>.
Here is the same number with each digit spoken separately:
<say-as interpret-as="digits">12345</say-as>.
Here is a word spelled out: <say-as interpret-as="spell-out">hello</say-as>
</speak>
Supported Speechcons
Speechcons are language specific. See the following pages for the available speechcons for each skill language:
speak
This is the root element of an SSML document. When using SSML with the Alexa Skills Kit, surround the text to be spoken with this tag.
<speak>
This is what Alexa sounds like without any SSML.
</speak>
sub
Pronounce the specified word or phrase as a different word or phrase. Specify the pronunciation to substitute with the alias attribute.
| Attribute | Possible Values |
|---|---|
|
|
The word or phrase to speak in place of the tagged text. |
This example replaces the abbreviated chemical elements with the full words:
<speak>
My favorite chemical element is <sub alias="aluminum">Al</sub>,
but Al prefers <sub alias="magnesium">Mg</sub>.
</speak>
w
Similar to <say-as>, this tag customizes the pronunciation of words by specifying the word’s part of speech.
| Attribute | Possible Values |
|---|---|
|
|
Set to one of the following
|
<speak>
The word <say-as interpret-as="characters">read</say-as> may be interpreted
as either the present simple form <w role="amazon:VB">read</w>,
or the past participle form <w role="amazon:VBD">read</w>.
</speak>
Note that these tags previously used the ivona namespace in the attribute names. The tags are backwards compatible, so existing SSML written with the ivona namespace continues to work.
Other SSML Reference Materials
All SSML tags:
- Speech Synthesis Markup Language (SSML) Reference (this document)
Speechcons (interjections):