Your Alexa Dashboards

Your Alexa Dashboards Support

Speech Synthesis Markup Language (SSML) Reference

When the service for your skill returns a response to a user’s request, you provide text that the Alexa service converts to speech. Alexa automatically handles normal punctuation, such as pausing after a period, or speaking a sentence ending in a question mark as a question.

However, in some cases you may want additional control over how Alexa generates the speech from the text in your response. For example, you may want a longer pause within the speech, or you may want a string of digits read back as a standard telephone number. The Alexa Skills Kit provides this type of control with Speech Synthesis Markup Language (SSML) support.

SSML is a markup language that provides a standard way to mark up text for the generation of synthetic speech. The Alexa Skills Kit supports a subset of the tags defined in the SSML specification. The specific tags supported are listed in Supported SSML Tags.

Using SSML in Your Response
Supported SSML Tags
amazon:effect
audio
- Converting Audio Files to an Alexa-Friendly Format
- Hosting the Audio Files for Your Skill
break
emphasis
p
phoneme
- Supported Symbols
prosody
s
say-as
- Supported Speechcons
speak
sub
w
Other SSML Reference Materials

Using SSML in Your Response

To use SSML, construct your output speech using the supported SSML tags. When sending back a response from your service, you must indicate that it is using SSML rather than plain text:

When using the Java library, use the SsmlOutputSpeech class. Call the setSsml() method and pass in the output speech marked up with the tags.
When not using the Java library, provide the marked-up text in the outputSpeech property, but set the type to SSML instead of PlainText. Use the ssml property instead of text for the marked-up text:

"outputSpeech": {
    "type": "SSML",
    "ssml": "<speak>This output speech uses SSML.</speak>"
}

You can use SSML with both the normal output speech response and any re-prompt included in the response.

The SSML you provide must be wrapped within <speak> tags. For example:

<speak>
    Here is a number <w role="amazon:VBD">read</w> 
    as a cardinal number: 
    <say-as interpret-as="cardinal">12345</say-as>. 
    Here is a word spelled out: 
    <say-as interpret-as="spell-out">hello</say-as>. 
</speak>

Supported SSML Tags

The Alexa Skills Kit supports the following SSML tags (listed in alphabetic order):

amazon:effect
audio
break
emphasis
p
phoneme
prosody
s
say-as
speak
sub
w

The remaining sections describe each of these tags.

Note that the Alexa service strips out any unsupported SSML tags included in the text you provide.

amazon:effect

Applies Amazon-specific effects to the speech.

Attribute Possible Values

Attribute	Possible Values
`name`	The name of the effect to apply to the speech. Available effects: `whispered`: Applies a whispering effect to the speech.

name

The name of the effect to apply to the speech. Available effects:

whispered: Applies a whispering effect to the speech.

<speak>
    I want to tell you a secret. 
    <amazon:effect name="whispered">I am not a real human.</amazon:effect>.
    Can you believe it?
</speak>

audio

The audio tag lets you provide the URL for an MP3 file that the Alexa service can play while rendering a response. You can use this to embed short, pre-recorded audio within your service’s response. For example, you could include sound effects alongside your text-to-speech responses, or provide responses using a voice associated with your brand. For more information, see Including Short Pre-Recorded Audio in your Response.

Attribute Possible Values

Attribute	Possible Values
`src`	Specifies the URL for the MP3 file. Note the following requirements and limitations: The MP3 must be hosted at an Internet-accessible HTTPS endpoint. HTTPS is required, and the domain hosting the MP3 file must present a valid, trusted SSL certificate. Self-signed certificates cannot be used. The MP3 must not contain any customer-specific or other sensitive information. The MP3 must be a valid MP3 file (MPEG version 2). The audio file cannot be longer than ninety (90) seconds. The bit rate must be 48 kbps. Note that this bit rate gives a good result when used with spoken content, but is generally not a high enough quality for music. The sample rate must be 16000 Hz. You may need to use converter software to convert your MP3 files to the required codec version (MPEG version 2) and bit rate (48 kbps).

src

Specifies the URL for the MP3 file. Note the following requirements and limitations:

The MP3 must be hosted at an Internet-accessible HTTPS endpoint. HTTPS is required, and the domain hosting the MP3 file must present a valid, trusted SSL certificate. Self-signed certificates cannot be used.
The MP3 must not contain any customer-specific or other sensitive information.
The MP3 must be a valid MP3 file (MPEG version 2).
The audio file cannot be longer than ninety (90) seconds.
The bit rate must be 48 kbps. Note that this bit rate gives a good result when used with spoken content, but is generally not a high enough quality for music.
The sample rate must be 16000 Hz.

You may need to use converter software to convert your MP3 files to the required codec version (MPEG version 2) and bit rate (48 kbps).

Include the audio tag within your text-to-speech response within the speak tag. Alexa plays the MP3 at the specified point within the text to speech. For example:

<speak>
    Welcome to Car-Fu. 
    <audio src="https://carfu.com/audio/carfu-welcome.mp3" /> 
    You can order a ride, or request a fare estimate. 
    Which will it be?
</speak> 

When Alexa renders this response, it would sound like this:

Alexa: Welcome to Car-Fu.
(the specified carfu-welcome.mp3 audio file plays)
Alexa: You can order a ride, or request a fare estimate. Which will it be?

A single response sent by your service can include multiple audio tags according to the following limits:

No more than five audio files can be used in a single response.
The combined total time for all audio files in a single response cannot be more than ninety (90) seconds.

Converting Audio Files to an Alexa-Friendly Format

You may need to use converter software to convert your MP3 files to the required codec version (MPEG version 2) and bit rate (48 kbps). One option for this is a command-line tool, FFmpeg. The following command converts the provided <input-file> to an MP3 file that works with the audio tag.

ffmpeg -i <input-file> -ac 2 -codec:a libmp3lame -b:a 48k -ar 16000 <output-file.mp3>

Another option is Audacity:

Open the file to convert.
Set the Project Rate in the lower-left corner to 16000.
Click File > Export Audio and change the Save as type to MP3 Files.
Click Options, set the Quality to 48 kbps and the Bit Rate Mode to Constant.

This requires the Lame library, which can be found at: http://lame.buanzo.org/#lamewindl.

Hosting the Audio Files for Your Skill

The MP3 files you use to provide audio must be hosted on an endpoint that uses HTTPS. The endpoint must provide an SSL certificate signed by an Amazon-approved certificate authority. Many content hosting services provide this. For example, you could host your files at a service such as Amazon Simple Storage Service (Amazon S3) (an Amazon Web Services offering).

We don’t require that you authenticate the requests for the audio files. Therefore, you must not include any customer-specific or sensitive information in these audio files. For example, building a custom MP3 file in response to a user’s request, and including sensitive information within the audio, is not allowed.

break

Represents a pause in the speech. Set the length of the pause with the strength or time attributes.

Attribute Possible Values

Attribute	Possible Values
`strength`	`none`: No pause should be outputted. This can be used to remove a pause that would normally occur (such as after a period). `x-weak`: No pause should be outputted (same as `none`). `weak`: Treat adjacent words as if separated by a single comma (equivalent to `medium`). `medium`: Treat adjacent words as if separated by a single comma. `strong`: Make a sentence break (equivalent to using the `<s>` tag). `x-strong`: Make a paragraph break (equivalent to using the `<p>` tag).
`time`	Duration of the pause; up to 10 seconds (`10s`) or 10000 milliseconds (`10000ms`). Include the unit with the time (`s` or `ms`).

strength

none: No pause should be outputted. This can be used to remove a pause that would normally occur (such as after a period).
x-weak: No pause should be outputted (same as none).
weak: Treat adjacent words as if separated by a single comma (equivalent to medium).
medium: Treat adjacent words as if separated by a single comma.
strong: Make a sentence break (equivalent to using the <s> tag).
x-strong: Make a paragraph break (equivalent to using the <p> tag).

time

Duration of the pause; up to 10 seconds (10s) or 10000 milliseconds (10000ms). Include the unit with the time (s or ms).

The default is medium. This is used if you don’t specify any attributes, or if you provide any unsupported attribute values.

<speak>
    There is a three second pause here <break time="3s"/> 
    then the speech continues.
</speak> 

emphasis

Emphasize the tagged words or phrases. Emphasis changes rate and volume of the speech. More emphasis is spoken louder and slower. Less emphasis is quieter and faster.

Attribute Possible Values

Attribute	Possible Values
`level`	`strong`: Increase the volume and slow down the speaking rate so the speech is louder and slower. `moderate`: Increase the volume and slow down the speaking rate, but not as much as when set to `strong`. This is used as a default if `level` is not provided. `reduced`: Decrease the volume and speed up the speaking rate. The speech is softer and faster.

level

strong: Increase the volume and slow down the speaking rate so the speech is louder and slower.
moderate: Increase the volume and slow down the speaking rate, but not as much as when set to strong. This is used as a default if level is not provided.
reduced: Decrease the volume and speed up the speaking rate. The speech is softer and faster.

<speak>
    I already told you I 
    <emphasis level="strong">really like</emphasis> 
    that person.
</speak> 

p

Represents a paragraph. This tag provides extra-strong breaks before and after the tag. This is equivalent to specifying a pause with <break strength="x-strong"/>.

<speak>                                         
    <p>This is the first paragraph. There should be a pause after this text is spoken.</p>       
    <p>This is the second paragraph.</p> 
</speak>                                        

phoneme

Provides a phonemic/phonetic pronunciation for the contained text. For example, people may pronounce words like “pecan” differently.

Attribute Possible Values

Attribute	Possible Values
`alphabet`	Set to the phonetic alphabet to use: `ipa`: The International Phonetic Alphabet (IPA). `x-sampa`: The Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA).
`ph`	The phonetic pronunciation to speak. See below for a list of supported symbols in each of the supported skill languages.

alphabet

Set to the phonetic alphabet to use:

ipa: The International Phonetic Alphabet (IPA).
x-sampa: The Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA).

ph

The phonetic pronunciation to speak.

See below for a list of supported symbols in each of the supported skill languages.

When using this tag, Alexa uses the pronunciation provided in the ph attribute rather than the text contained within the tag. However, you should still provide human-readable text within the tags. In the following example, the word “pecan” shown within the tags is never spoken. Instead, Alexa speaks the text provided in the ph attribute:

<speak>
    You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>. 
    I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
</speak> 

Additional examples of writing words with a phonetic alphabet:

Word	IPA	X-SAMPA
bottle	ˈbɑ.təl	"bA.t@l
frozen	ˈfɹoʊ.zən	"fr\oU.z@n
blossom	ˈblɑ.səm	"blA.s@m

Word

IPA

X-SAMPA

bottle

ˈbɑ.təl

"bA.t@l

frozen

ˈfɹoʊ.zən

"fr\oU.z@n

blossom

ˈblɑ.səm

"blA.s@m

Supported Symbols

The following tables list the supported symbols for use with the phoneme tag. The symbols are specific to the skill’s language.

These symbols provide full coverage for the sounds of English (US). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (US) skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants

IPA	X-SAMPA	Description	Examples
b	b	voiced bilabial plosive	bed
d	d	voiced alveolar plosive	dig
d͡ʒ	dZ	voiced postalveolar affricate	jump
ð	D	voiced dental fricative	then
f	f	voiceless labiodental fricative	five
g	g	voiced velar plosive	game
h	h	voiceless glottal fricative	house
j	j	palatal approximant	yes
k	k	voiceless velar plosive	cat
l	l	alveolar lateral approximant	lay
m	m	bilabial nasal	mouse
n	n	alveolar nasal	nap
ŋ	N	velar nasal	thing
p	p	voiceless bilabial plosive	speak
ɹ	r\	alveolar approximant	red
s	s	voiceless alveolar fricative	seem
ʃ	S	voiceless postalveolar fricative	ship
t	t	voiceless alveolar plosive	trap
t͡ʃ	tS	voiceless postalveolar affricate	chart
θ	T	voiceless dental fricative	thin
v	v	voiced labiodental fricative	vest
w	w	labial-velar approximant	west
z	z	voiced alveolar fricative	zero
ʒ	Z	voiced postalveolar fricative	vision

IPA

X-SAMPA

Description

Examples

voiced bilabial plosive

bed

voiced alveolar plosive

dig

d͡ʒ

voiced postalveolar affricate

jump

voiced dental fricative

then

voiceless labiodental fricative

five

voiced velar plosive

game

voiceless glottal fricative

house

palatal approximant

yes

voiceless velar plosive

cat

alveolar lateral approximant

lay

bilabial nasal

mouse

alveolar nasal

nap

velar nasal

thing

voiceless bilabial plosive

speak

alveolar approximant

red

voiceless alveolar fricative

seem

voiceless postalveolar fricative

ship

voiceless alveolar plosive

trap

t͡ʃ

voiceless postalveolar affricate

chart

voiceless dental fricative

thin

voiced labiodental fricative

vest

labial-velar approximant

west

voiced alveolar fricative

zero

voiced postalveolar fricative

vision

Vowels

IPA	X-SAMPA	Description	Examples
ə	@	mid central vowel	arena
ɚ	@`	mid central r-colored vowel	reader
æ	{	near-open front unrounded vowel	trap
aɪ	aI	diphthong	price
aʊ	aU	diphthong	mouth
ɑ	A	long open back unrounded vowel	father
eɪ	eI	diphthong	face
ɝ	3`	open-mid central unrounded r-colored vowel	nurse
ɛ	E	open-mid front unrounded vowel	dress
i	i	long close front unrounded vowel	fleece
ɪ	I	near-close near-front unrounded vowel	kit
oʊ	oU	diphthong	goat
ɔ	O	long open-mid back rounded vowel	thought
ɔɪ	OI	diphthong	choice
u	u	long close back rounded vowel	goose
ʊ	U	near-close near-back rounded vowel	foot
ʌ	V	open-mid back unrounded vowel	strut

IPA

X-SAMPA

Description

Examples

mid central vowel

arena

mid central r-colored vowel

reader

{

near-open front unrounded vowel

trap

aɪ

diphthong

price

aʊ

diphthong

mouth

long open back unrounded vowel

father

eɪ

diphthong

face

open-mid central unrounded r-colored vowel

nurse

open-mid front unrounded vowel

dress

long close front unrounded vowel

fleece

near-close near-front unrounded vowel

kit

oʊ

diphthong

goat

long open-mid back rounded vowel

thought

ɔɪ

diphthong

choice

long close back rounded vowel

goose

near-close near-back rounded vowel

foot

open-mid back unrounded vowel

strut

Additional Symbols

IPA	X-SAMPA	Description	Examples
ˈ	"	primary stress	Alabama
ˌ	%	secondary stress	Alabama
.	.	syllable boundary	A.la.ba.ma

IPA

X-SAMPA

Description

Examples

primary stress

Alabama

secondary stress

Alabama

syllable boundary

A.la.ba.ma

These symbols provide full coverage for the sounds of English (UK). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (UK) skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants

IPA	X-SAMPA	Description	Examples
b	b	voiced bilabial plosive	bed
d	d	voiced alveolar plosive	dig
d͡ʒ	dZ	voiced postalveolar affricate	jump
ð	D	voiced dental fricative	then
f	f	voiceless labiodental fricative	five
g	g	voiced velar plosive	game
h	h	voiceless glottal fricative	house
j	j	palatal approximant	yes
k	k	voiceless velar plosive	cat
l	l	alveolar lateral approximant	lay
m	m	bilabial nasal	mouse
n	n	alveolar nasal	nap
ŋ	N	velar nasal	thing
p	p	voiceless bilabial plosive	speak
ɹ	r\	alveolar approximant	red
s	s	voiceless alveolar fricative	seem
ʃ	S	voiceless postalveolar fricative	ship
t	t	voiceless alveolar plosive	trap
t͡ʃ	tS	voiceless postalveolar affricate	chart
θ	T	voiceless dental fricative	thin
v	v	voiced labiodental fricative	vest
w	w	labial-velar approximant	west
z	z	voiced alveolar fricative	zero
ʒ	Z	voiced postalveolar fricative	vision

IPA

X-SAMPA

Description

Examples

voiced bilabial plosive

bed

voiced alveolar plosive

dig

d͡ʒ

voiced postalveolar affricate

jump

voiced dental fricative

then

voiceless labiodental fricative

five

voiced velar plosive

game

voiceless glottal fricative

house

palatal approximant

yes

voiceless velar plosive

cat

alveolar lateral approximant

lay

bilabial nasal

mouse

alveolar nasal

nap

velar nasal

thing

voiceless bilabial plosive

speak

alveolar approximant

red

voiceless alveolar fricative

seem

voiceless postalveolar fricative

ship

voiceless alveolar plosive

trap

t͡ʃ

voiceless postalveolar affricate

chart

voiceless dental fricative

thin

voiced labiodental fricative

vest

labial-velar approximant

west

voiced alveolar fricative

zero

voiced postalveolar fricative

vision

Vowels

IPA	X-SAMPA	Description	Examples
ə	@	mid central vowel	arena
æ	{	near-open front unrounded vowel	trap
aɪ	aI	diphthong	price
aʊ	aU	diphthong	mouth
ɑ	A	long open back unrounded vowel	father
eɪ	eI	diphthong	face
ɜ	3	open-mid central unrounded vowel	nurse
ɛ	E	open-mid front unrounded vowel	dress
i	i	long close front unrounded vowel	fleece
ɪ	I	near-close near-front unrounded vowel	kit
əʊ	@U	diphthong	goat
ɔ	O	long open-mid back rounded vowel	thought
ɔɪ	OI	diphthong	choice
u	u	long close back rounded vowel	goose
ʊ	U	near-close near-back rounded vowel	foot
ʌ	V	open-mid back unrounded vowel	strut
ɒ	Q	open back rounded vowel	bother
ɛə	E@	diphthong	bear
ɪə	I@	diphthong	beer
ʊə	U@	diphthong	tour

IPA

X-SAMPA

Description

Examples

mid central vowel

arena

{

near-open front unrounded vowel

trap

aɪ

diphthong

price

aʊ

diphthong

mouth

long open back unrounded vowel

father

eɪ

diphthong

face

open-mid central unrounded vowel

nurse

open-mid front unrounded vowel

dress

long close front unrounded vowel

fleece

near-close near-front unrounded vowel

kit

əʊ

diphthong

goat

long open-mid back rounded vowel

thought

ɔɪ

diphthong

choice

long close back rounded vowel

goose

near-close near-back rounded vowel

foot

open-mid back unrounded vowel

strut

open back rounded vowel

bother

ɛə

diphthong

bear

ɪə

diphthong

beer

ʊə

diphthong

tour

Additional Symbols

IPA	X-SAMPA	Description	Examples
ˈ	"	primary stress	Alabama
ˌ	%	secondary stress	Alabama
.	.	syllable boundary	A.la.ba.ma

IPA

X-SAMPA

Description

Examples

primary stress

Alabama

secondary stress

Alabama

syllable boundary

A.la.ba.ma

These symbols provide full coverage for the sounds of English (India). Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for English (India) skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants
Vowels
Additional Symbols
Hindi Consonants
Hindi Vowels

Consonants

IPA	X-SAMPA	Description	Examples
b	b	voiced bilabial plosive	bed
d	d	voiced alveolar plosive	dig
d͡ʒ	dZ	voiced postalveolar affricate	jump
ð	D	voiced dental fricative	then
f	f	voiceless labiodental fricative	five
g	g	voiced velar plosive	game
h	h	voiceless glottal fricative	house
j	j	palatal approximant	yes
k	k	voiceless velar plosive	cat
l	l	alveolar lateral approximant	lay
m	m	bilabial nasal	mouse
n	n	alveolar nasal	nap
ŋ	N	velar nasal	thing
p	p	voiceless bilabial plosive	speak
ɹ	r\	alveolar approximant	red
s	s	voiceless alveolar fricative	seem
ʃ	S	voiceless postalveolar fricative	ship
t	t	voiceless alveolar plosive	trap
t͡ʃ	tS	voiceless postalveolar affricate	chart
θ	T	voiceless dental fricative	thin
v	v	voiced labiodental fricative	vest
w	w	labial-velar approximant	west
z	z	voiced alveolar fricative	zero
ʒ	Z	voiced postalveolar fricative	vision

IPA

X-SAMPA

Description

Examples

voiced bilabial plosive

bed

voiced alveolar plosive

dig

d͡ʒ

voiced postalveolar affricate

jump

voiced dental fricative

then

voiceless labiodental fricative

five

voiced velar plosive

game

voiceless glottal fricative

house

palatal approximant

yes

voiceless velar plosive

cat

alveolar lateral approximant

lay

bilabial nasal

mouse

alveolar nasal

nap

velar nasal

thing

voiceless bilabial plosive

speak

alveolar approximant

red

voiceless alveolar fricative

seem

voiceless postalveolar fricative

ship

voiceless alveolar plosive

trap

t͡ʃ

voiceless postalveolar affricate

chart

voiceless dental fricative

thin

voiced labiodental fricative

vest

labial-velar approximant

west

voiced alveolar fricative

zero

voiced postalveolar fricative

vision

Vowels

IPA	X-SAMPA	Description	Examples
ə	@	mid central vowel	arena
æ	{	near-open front unrounded vowel	trap
aɪ	aI	diphthong	price
aʊ	aU	diphthong	mouth
ɑ	A	long open back unrounded vowel	father
eɪ	eI	diphthong	face
ɜ	3	open-mid central unrounded vowel	nurse
ɛ	E	open-mid front unrounded vowel	dress
i	i	long close front unrounded vowel	fleece
ɪ	I	near-close near-front unrounded vowel	kit
əʊ	@U	diphthong	goat
ɔ	O	long open-mid back rounded vowel	thought
ɔɪ	OI	diphthong	choice
u	u	long close back rounded vowel	goose
ʊ	U	near-close near-back rounded vowel	foot
ʌ	V	open-mid back unrounded vowel	strut
ɒ	Q	open back rounded vowel	bother
ɛə	E@	diphthong	bear
ɪə	I@	diphthong	beer
ʊə	U@	diphthong	tour

IPA

X-SAMPA

Description

Examples

mid central vowel

arena

{

near-open front unrounded vowel

trap

aɪ

diphthong

price

aʊ

diphthong

mouth

long open back unrounded vowel

father

eɪ

diphthong

face

open-mid central unrounded vowel

nurse

open-mid front unrounded vowel

dress

long close front unrounded vowel

fleece

near-close near-front unrounded vowel

kit

əʊ

diphthong

goat

long open-mid back rounded vowel

thought

ɔɪ

diphthong

choice

long close back rounded vowel

goose

near-close near-back rounded vowel

foot

open-mid back unrounded vowel

strut

open back rounded vowel

bother

ɛə

diphthong

bear

ɪə

diphthong

beer

ʊə

diphthong

tour

Additional Symbols

IPA	X-SAMPA	Description	Examples
ˈ	"	primary stress	Alabama
ˌ	%	secondary stress	Alabama
.	.	syllable boundary	A.la.ba.ma

IPA

X-SAMPA

Description

Examples

primary stress

Alabama

secondary stress

Alabama

syllable boundary

A.la.ba.ma

Hindi Consonants

IPA	XSAMPA	Description	Examples
pʰ	p_h	voiceless aspirated bilabial plosive	फूल (phool)
bʱ	b_h	voiced aspirated bilabial plosive	भारी (bhaari)
t̪	t_d	voiceless dental plosive	तापमान (taapmaan)
t̪ʰ	t_d_h	voiceless aspirated dental plosive	थोड़ा (thoda)
d̪	d_d	voiced dental plosive	दिल्ली (dilli)
d̪ʱ	d_d_h	voiced aspirated dental plosive	धोबी (dhobi)
ʈ	t`	voiceless retroflex plosive	कटोरा (katora)
ʈʰ	t`_h	voiceless aspirated retroflex plosive	ठंड (thand)
ɖ	d`	voiced retroflex plosive	डर (darr)
ɖʱ	d`_h	voiced aspirated retroflex plosive	ढाल (dhal)
tʃʰ	tS_h	voiceless aspirated palatal affricate	छाल (chaal)
dʒʱ	dZ_h	voiced aspirated palatal affricate	झाल (jhaal)
kʰ	k_h	voiceless aspirated velar plosive	खान (khan)
ɡʱ	g_h	voiced aspirated velar plosive	घान (ghaan)
ɳ	n`	retroflex nasal	क्षण (kshan)
ɾ	4	alveolar flap	राम (ram)
ɽ	r`	plain retroflex flap	बड़ा (bada)
ɽʱ	r`_h	voiced aspirated retroflex flap	बढ़ी (barhi)
ʋ	v\	bilabial approximant	वसूल (wasool)

Hindi Vowels

IPA	XSAMPA	Description	Examples
ə	@_o	mid central vowel	अच्छा (achhaa)
ə̃	@~	nasalised mid central vowel	हँसना (hansnaa)
a	A_o	open front unrounded vowel	आग (aag)
ã	A~	nasalised open front unrounded vowel	घड़ियाँ (ghariyaan)
ɪ	I_o	near-close near-front unrounded vowel	इक्कीस (ikkees)
ɪ̃	I~	nasalised near-close near front unrounded vowel	सिंचाई (sinchai)
i	i_o	close front unrounded vowel	बिल्ली (billee)
ĩ	i~	nasalised close front unrounded vowel	नहीं (nahin)
ʊ	U_o	near-close near-back rounded vowel	उल्लू (ullu)
ʊ̃	U~	nasalised near-close near-back rounded vowel	मुँह (munh)
u	u_o	close back rounded vowel	फूल (phool)
ũ	u~	nasalised close back rounded vowel	ऊँट (oont)
ɔ	O_o	open-mid back rounded vowel	कौन (kaun)
ɔ̃	O~	nasalised open-mid back rounded vowel	भौं (bhaun)
o	o	close-mid back rounded vowel	सोना (sona)
õ	o~	nasalised close-mid back rounded vowel	क्यों (kyon)
ɛ	E_o	open-mid front unrounded vowel	पैसा (paisa)
ɛ̃	E~	nasalised open-mid front unrounded vowel	मैं (main)
e	e	close-mid front unrounded vowel	एक (ek)
ẽ	e~	nasalised close-mid front unrounded vowel	किताबें (kitabein)

These symbols provide full coverage for the sounds of German. Other languages require the use of symbols not included in this list, which are not supported. Using symbols not included in this list for German skills is discouraged, as it may result in suboptimal speech synthesis.

Consonants

IPA	X-Sampa	Description	Examples
b	b	voiced bilabial plosive	Bier
d	d	voiced alveolar plosive	Dach
ç	C	voiceless palatal fricative	ich
d͡ʒ	dZ	voiced postalveolar affricate	Dschungel
f	f	voiceless labiodental fricative	Vogel
g	g	voiced velar plosive	Gabel
h	h	voiceless glottal fricative	Haus
j	j	palatal approximant	jemand
k	k	voiceless velar plosive	Kleid
l	l	alveolar lateral approximant	Loch
m	m	bilabial nasal	Milch
n	n	alveolar nasal	Natur
ŋ	N	velar nasal	klingen
p	p	voiceless bilabial plosive	Park
p͡f	pf	voiceless labiodental affricate	Apfel
ʀ	R	uvular trill	Regen
s	s	voiceless alveolar fricative	Messer
ʃ	S	voiceless postalveolar fricative	Fischer
t	t	voiceless alveolar plosive	Topf
t͡s	ts	voiceless alveolar affricate	Zahl
t͡ʃ	tS	voiceless postalveolar affricate	deutsch
v	v	voiced labiodental fricative	Wasser
x	x	voiceless velar fricative	kochen
z	z	voiced alveolar fricative	See
ʒ	Z	voiced postalveolar fricative	Orange

IPA

X-Sampa

Description

Examples

voiced bilabial plosive

Bier

voiced alveolar plosive

Dach

voiceless palatal fricative

ich

d͡ʒ

voiced postalveolar affricate

Dschungel

voiceless labiodental fricative

Vogel

voiced velar plosive

Gabel

voiceless glottal fricative

Haus

palatal approximant

jemand

voiceless velar plosive

Kleid

alveolar lateral approximant

Loch

bilabial nasal

Milch

alveolar nasal

Natur

velar nasal

klingen

voiceless bilabial plosive

Park

p͡f

voiceless labiodental affricate

Apfel

uvular trill

Regen

voiceless alveolar fricative

Messer

voiceless postalveolar fricative

Fischer

voiceless alveolar plosive

Topf

t͡s

voiceless alveolar affricate

Zahl

t͡ʃ

voiceless postalveolar affricate

deutsch

voiced labiodental fricative

Wasser

voiceless velar fricative

kochen

voiced alveolar fricative

See

voiced postalveolar fricative

Orange

Vowels

IPA	X-Sampa	Description	Examples
a	a	open front unrounded vowel	Salz
aː	a:	long open front unrounded vowel	Sahne
aʊ	aU	diphthong	Augen
ə	@	mid central vowel	Rede
ɐ	6	near-open central vowel	besser
aɪ	aI	diphthong	nein
ɛ	E	open-mid front unrounded vowel	Kellner
eː	e:	long close-mid front unrounded vowel	Rede
øː	2:	long close-mid front rounded vowel	böse
ɪ	I	near-close near-front unrounded vowel	bitte
iː	i:	long close front unrounded vowel	Lied
ɔ	O	open-mid back rounded vowel	Koffer
œ	9	open-mid front rounded vowel	können
oː	o:	long close-mid back rounded vowel	Kohl
ɔʏ	OY	diphthong	neu
ʊ	U	near-close near-back rounded vowel	Wunder
ʏ	Y	near-close near-front rounded vowel	Küche
uː	u:	long close back rounded vowel	Bruder
yː	y:	long close front rounded vowel	kühl

IPA

X-Sampa

Description

Examples

open front unrounded vowel

Salz

aː

long open front unrounded vowel

Sahne

aʊ

diphthong

Augen

mid central vowel

Rede

near-open central vowel

besser

aɪ

diphthong

nein

open-mid front unrounded vowel

Kellner

eː

long close-mid front unrounded vowel

Rede

øː

long close-mid front rounded vowel

böse

near-close near-front unrounded vowel

bitte

iː

long close front unrounded vowel

Lied

open-mid back rounded vowel

Koffer

open-mid front rounded vowel

können

oː

long close-mid back rounded vowel

Kohl

ɔʏ

diphthong

neu

near-close near-back rounded vowel

Wunder

near-close near-front rounded vowel

Küche

uː

long close back rounded vowel

Bruder

yː

long close front rounded vowel

kühl

Centralised Diphthongs

IPA	X-Sampa	Examples
aɐ̯	a6_^	hart
aːɐ̯	a:6_^	Haar
ɛɐ̯	E6_^	Berg
eːɐ̯	e:6_^	schwer
øːɐ̯	2:6_^	Nadelöhr
ɪɐ̯	I6_^	Wirtschaft
iːɐ̯	i:6_^	Tier
ɔɐ̯	O6_^	dort
œɐ̯	96_^	Wörter
oːɐ̯	o:6_^	Ohr
ʊɐ̯	U6_^	Gurke
ʏɐ̯	Y6_^	Türkei
uːɐ̯	u:6_^	Kur
yːɐ̯	y:6_^	Tür

IPA

X-Sampa

Examples

aɐ̯

a6_^

hart

aːɐ̯

a:6_^

Haar

ɛɐ̯

E6_^

Berg

eːɐ̯

e:6_^

schwer

øːɐ̯

2:6_^

Nadelöhr

ɪɐ̯

I6_^

Wirtschaft

iːɐ̯

i:6_^

Tier

ɔɐ̯

O6_^

dort

œɐ̯

96_^

Wörter

oːɐ̯

o:6_^

Ohr

ʊɐ̯

U6_^

Gurke

ʏɐ̯

Y6_^

Türkei

uːɐ̯

u:6_^

Kur

yːɐ̯

y:6_^

Tür

English Phonemes

IPA	X-Sampa	Description	Examples
ð	D	voiced dental fricative	brother
ɹ	r\	alveolar approximant	ripe
θ	T	voiceless dental fricative	north
w	w	labial-velar approximant	well
ɔː	O:	long open-mid back rounded vowel	callcenter
eɪ	eI	diphthong	rating
oʊ	oU	diphthong	windows

IPA

X-Sampa

Description

Examples

voiced dental fricative

brother

alveolar approximant

ripe

voiceless dental fricative

north

labial-velar approximant

well

ɔː

long open-mid back rounded vowel

callcenter

eɪ

diphthong

rating

oʊ

diphthong

windows

French Phonemes

IPA	X-Sampa	Description	Examples
ã:	a~:	nasalized long open front unrounded vowel	Croissant
ɛ̃ː	E~:	nasalized long open-mid front unrounded vowel	Terrain
õ:	o~:	nasalized long close-mid back rounded vowel	Annonce

IPA

X-Sampa

Description

Examples

ã:

a~:

nasalized long open front unrounded vowel

Croissant

ɛ̃ː

E~:

nasalized long open-mid front unrounded vowel

Terrain

õ:

o~:

nasalized long close-mid back rounded vowel

Annonce

Additional Symbols

IPA	X-Sampa	Description	Examples
ˈ	"	primary stress	genau
.	.	syllable boundary	ver.stan.den

IPA

X-Sampa

Description

Examples

primary stress

genau

syllable boundary

ver.stan.den

prosody

Modifies the volume, pitch, and rate of the tagged speech.

Attribute Possible Values

Attribute	Possible Values
`rate`	Modify the rate of the speech: `x-slow`, `slow`, `medium`, `fast`, `x-fast`: Set the rate to a predefined value. `n%`: specify a percentage to increase or decrease the speed of the speech: `100%` indicates no change from the normal rate. Percentages greater than `100%` increase the rate. Percentages below `100%` decrease the rate. The minimum value you can provide is `20%`.
`pitch`	Raise or lower the tone (pitch) of the speech: `x-low`, `low`, `medium`, `high`, `x-high`: Set the pitch to a predefined value. `+n%`: Increase the pitch by the specified percentage. For example: `+10%`, `+5%`. The maximum value allowed is `+50%`. A value higher than `+50%` is rendered as `+50%`. `-n%`: Decrease the pitch by the specified percentage. For example: `-10%`, `-20%`. The smallest value allowed is `-33.3%`. A value lower than `-33.3%` is rendered as `-33.3%`.
`volume`	Change the volume for the speech: `silent`, `x-soft`, `soft`, `medium`, `loud`, `x-loud`: Set volume to a predefined value for current voice. `+ndB`: Increase volume relative to the current volume level. For example, `+0dB` means no change of volume. `+6dB` is approximately twice the current amplitude. The maximum positive value is about `+4.08dB`. `-ndB`: Decrease the volume relative to the current volume level. For example, `-6dB` means approximately half the current amplitude.

rate

Modify the rate of the speech:

x-slow, slow, medium, fast, x-fast: Set the rate to a predefined value.
n%: specify a percentage to increase or decrease the speed of the speech:
- 100% indicates no change from the normal rate.
- Percentages greater than 100% increase the rate.
- Percentages below 100% decrease the rate.
- The minimum value you can provide is 20%.

pitch

Raise or lower the tone (pitch) of the speech:

x-low, low, medium, high, x-high: Set the pitch to a predefined value.
+n%: Increase the pitch by the specified percentage. For example: +10%, +5%. The maximum value allowed is +50%. A value higher than +50% is rendered as +50%.
-n%: Decrease the pitch by the specified percentage. For example: -10%, -20%. The smallest value allowed is -33.3%. A value lower than -33.3% is rendered as -33.3%.

volume

Change the volume for the speech:

silent, x-soft, soft, medium, loud, x-loud: Set volume to a predefined value for current voice.
+ndB: Increase volume relative to the current volume level. For example, +0dB means no change of volume. +6dB is approximately twice the current amplitude. The maximum positive value is about +4.08dB.
-ndB: Decrease the volume relative to the current volume level. For example, -6dB means approximately half the current amplitude.

<speak>
    Normal volume for the first sentence.
    <prosody volume="x-loud">Louder volume for the second sentence</prosody>.
    When I wake up, <prosody rate="x-slow">I speak quite slowly</prosody>.
    I can speak with my normal pitch, 
    <prosody pitch="x-high"> but also with a much higher pitch </prosody>, 
    and also <prosody pitch="low">with a lower pitch</prosody>.
</speak>

s

Represents a sentence. This tag provides strong breaks before and after the tag.

This is equivalent to:

Ending a sentence with a period (.).
Specifying a pause with <break strength="strong"/>.

<speak>
    <s>This is a sentence</s>
    <s>There should be a short pause before this second sentence</s> 
    This sentence ends with a period and should have the same pause.
</speak>

say-as

Describes how the text should be interpreted. This lets you provide additional context to the text and eliminate any ambiguity on how Alexa should render the text. Indicate how Alexa should interpret the text with the interpret-as attribute.

Attribute Possible Values

Attribute	Possible Values
`interpret-as`	`characters`, `spell-out`: Spell out each letter. `cardinal`, `number`: Interpret the value as a cardinal number. `ordinal`: Interpret the value as an ordinal number. `digits`: Spell each digit separately . `fraction`: Interpret the value as a fraction. This works for both common fractions (such as 3/20) and mixed fractions (such as 1+1/2). `unit`: Interpret a value as a measurement. The value should be either a number or fraction followed by a unit (with no space in between) or just a unit. `date`: Interpret the value as a date. Specify the format with the `format` attribute. `time`: Interpret a value such as `1'21"` as duration in minutes and seconds. `telephone`: Interpret a value as a 7-digit or 10-digit telephone number. This can also handle extensions (for example, 2025551212x345). `address`: Interpret a value as part of street address. `interjection`: Interpret the value as an interjection. Alexa speaks the text in a more expressive voice. For optimal results, only use the supported interjections and surround each one with a pause. For example: `<say-as interpret-as="interjection">Wow.</say-as>`. Speechcons are supported for the languages listed below. `expletive`: “Bleep” out the content inside the tag.
`format`	Only used when `interpret-as` is set to `date`. Set to one of the following to indicate format of the date: `mdy` `dmy` `ymd` `md` `dm` `ym` `my` `d` `m` `y` Alternatively, if you provide the date in YYYYMMDD format, the `format` attribute is ignored. You can include question marks (?) for portions of the date to leave out. For instance, Alexa would speak `<say-as interpret-as="date">????0922</say-as>` as “September 22nd”.

interpret-as

characters, spell-out: Spell out each letter.
cardinal, number: Interpret the value as a cardinal number.
ordinal: Interpret the value as an ordinal number.
digits: Spell each digit separately .
fraction: Interpret the value as a fraction. This works for both common fractions (such as 3/20) and mixed fractions (such as 1+1/2).
unit: Interpret a value as a measurement. The value should be either a number or fraction followed by a unit (with no space in between) or just a unit.
date: Interpret the value as a date. Specify the format with the format attribute.
time: Interpret a value such as 1'21" as duration in minutes and seconds.
telephone: Interpret a value as a 7-digit or 10-digit telephone number. This can also handle extensions (for example, 2025551212x345).
address: Interpret a value as part of street address.
interjection: Interpret the value as an interjection. Alexa speaks the text in a more expressive voice. For optimal results, only use the supported interjections and surround each one with a pause. For example: <say-as interpret-as="interjection">Wow.</say-as>. Speechcons are supported for the languages listed below.
expletive: “Bleep” out the content inside the tag.

format

Only used when interpret-as is set to date. Set to one of the following to indicate format of the date:

mdy
dmy
ymd
md
dm
ym
my
d
m
y

Alternatively, if you provide the date in YYYYMMDD format, the format attribute is ignored. You can include question marks (?) for portions of the date to leave out. For instance, Alexa would speak <say-as interpret-as="date">????0922</say-as> as “September 22nd”.

Note that the Alexa service attempts to interpret the provided text correctly based on the text’s formatting even without this tag. For example, if your output speech includes “202-555-1212”, Alexa speaks each individual digit, with a brief pause for each dash. You don’t need to use <say-as interpret-as="telephone"> in this case. However, if you provided the text “2025551212”, but you wanted Alexa to speak it as a phone number, you would need to use <say-as interpret-as="telephone">.

<speak>
    Here is a number spoken as a cardinal number: 
    <say-as interpret-as="cardinal">12345</say-as>.
    Here is the same number with each digit spoken separately:
    <say-as interpret-as="digits">12345</say-as>.
    Here is a word spelled out: <say-as interpret-as="spell-out">hello</say-as>
</speak>

Supported Speechcons

Speechcons are language specific. See the following pages for the available speechcons for each skill language:

speak

This is the root element of an SSML document. When using SSML with the Alexa Skills Kit, surround the text to be spoken with this tag.

<speak>
    This is what Alexa sounds like without any SSML.
</speak>

sub

Pronounce the specified word or phrase as a different word or phrase. Specify the pronunciation to substitute with the alias attribute.

Attribute Possible Values

Attribute	Possible Values
`alias`	The word or phrase to speak in place of the tagged text.

alias

The word or phrase to speak in place of the tagged text.

This example replaces the abbreviated chemical elements with the full words:

<speak>
    My favorite chemical element is <sub alias="aluminum">Al</sub>,
    but Al prefers <sub alias="magnesium">Mg</sub>. 
</speak> 

w

Similar to <say-as>, this tag customizes the pronunciation of words by specifying the word’s part of speech.

Attribute Possible Values

Attribute	Possible Values
`role`	Set to one of the following `amazon:VB`: Interpret the word as a verb (present simple). `amazon:VBD`: Interpret the word as a past participle. `amazon:NN`: Interpret the word as a noun. `amazon:SENSE_1`: Use the non-default sense of the word. For example, the noun “bass” is pronounced differently depending on meaning. The “default” meaning is the lowest part of the musical range. The alternate sense (which is still a noun) is a freshwater fish. Specifying `<speak><w role="amazon:SENSE_1">bass</w>"</speak>` renders the non-default pronunciation (freshwater fish).

role

Set to one of the following

amazon:VB: Interpret the word as a verb (present simple).
amazon:VBD: Interpret the word as a past participle.
amazon:NN: Interpret the word as a noun.
amazon:SENSE_1: Use the non-default sense of the word. For example, the noun “bass” is pronounced differently depending on meaning. The “default” meaning is the lowest part of the musical range. The alternate sense (which is still a noun) is a freshwater fish. Specifying <speak><w role="amazon:SENSE_1">bass</w>"</speak> renders the non-default pronunciation (freshwater fish).

<speak>
    The word <say-as interpret-as="characters">read</say-as> may be interpreted 
    as either the present simple form <w role="amazon:VB">read</w>, 
    or the past participle form <w role="amazon:VBD">read</w>.
</speak> 

Note that these tags previously used the ivona namespace in the attribute names. The tags are backwards compatible, so existing SSML written with the ivona namespace continues to work.

Other SSML Reference Materials

All SSML tags:

Speech Synthesis Markup Language (SSML) Reference (this document)

Speechcons (interjections):