Voice Interface and User Experience Testing for a Custom Skill
Voice interface and user experience testing focuses on:
- Testing the user experience to ensure that the skill is aligned with several key features of Alexa that help create a great experience for customers.
-
Reviewing the intent schema, the set of sample utterances, and the list of values for any custom slot types you have defined to ensure that they are correct, complete, and adhere to voice design best practices.
These components are defined on the Interaction Model page for your skill in the developer portal.
These tests address the following goals:
- Increase the different ways end users can phrase requests to your skill.
- Evaluate the ease of speech recognition when using your skill (was Alexa able to recognize the right words?)
- Improve language understanding (when Alexa recognizes the right words, did she understand what to do?).
- Ensure that users can speak to Alexa naturally and spontaneously.
- Ensure that Alexa understands most requests you make, within the context of a skill’s functionality.
- Ensure that Alexa responds to users’ requests in an appropriate way, by either fulfilling them or explaining why she can’t.
Many of these tests verify that your skill adheres to the design guidelines described in Alexa Voice Design Guide. You may want to review those guidelines while working through this section. For recommendations for sample utterances, see Best Practices for Sample Utterances and Custom Slot Type Values.
Note that many of these tests require that you have a device for voice testing. If you do not have a device with Alexa, you can use third party Alexa-enabled services, such as Echosim.io, to test your Alexa skill.
This document is oriented towards skills that do not include a screen or touch component.
To return to the high-level testing checklist, see Certification Requirements for Custom Skills.
- 4.1. Session Management
- 4.2. Intent and Slot Combinations
- 4.3. Intent Response (Design)
- 4.4. Supportive Prompting
- 4.5. Invocation Name
- 4.6. One-Shot Phrasing for Sample Utterances
- 4.7. Variety of Sample Utterances
- 4.8. Intents and Slot Types
- 4.9. Custom Slot Type Values
- 4.10. Writing Conventions for Sample Utterances
- 4.11. Error Handling
- 4.12. Providing Help
- 4.13. Stopping and Canceling
- Appendix: Deprecated Test for Sample Utterances (Slot Type Values)
- Next Steps
4.1. Session Management
Every response sent from your skill to the Alexa service includes a flag indicating whether the conversation with the user (the session) should end or continue. If the flag is set to continue, Alexa then listens and waits for the user’s response. For Amazon devices such as Amazon Echo that have a blue light ring, the device lights up to give the user a visual cue that Alexa is listening for the user’s response. On Echo Show, the top of the screen flashes blue.
This test verifies that the text-to-speech provided by your skill and the session flag work together for a good user experience. Responses that ask questions leave the session open for a reply, while responses that fulfill the user’s request close the session.
| Test | Expected Results | |
|---|---|---|
1. |
Invoke the skill without specifying an intent, for example:
Respond to the prompt provided by the skill and verify that you get a correct response. |
After every response that asks the user a question, the session remains open and the device waits for your response. After every response that completes the user’s request, the interaction ends. |
2. |
Test a variety of intents – both those that ask questions and those complete the user’s request. |
After every response that asks the user a question, the session remains open and the device waits for your response. After every response that completes the user’s request, the interaction ends. |
4.2. Intent and Slot Combinations
A skill may have several intents and slots. This test verifies that each intent returns the expected response with different combinations of slots.
| Test | Expected Results | |
|---|---|---|
1. |
Test the skill’s intent responses using different combinations of slot values. You can use one of the one-shot phrases for starting the skill, for example:
Be sure to invoke every intent, not just those that are typically used in a one-shot manner. Evaluate the response for each intent |
The response is appropriate for the context of the request. For example, if the request includes a slot value, the response is relevant to that information. If a request to that same intent does not include the slot, the response uses a default or asks the user for clarification |
You may want to use a table of intent and slot values to track this test and ensure that you test every intent and slot combination. For example:
| Intent | Slot Combination | Sample Utterance to Test |
|---|---|---|
IntentName |
SlotOne |
This is an utterance to test this intent and slot one |
IntentName |
SlotTwo |
This is an utterance to test this intent and slot two |
IntentName |
SlotOneSlotTwo |
This is an utterance to test this intent with both slot one and slot two |
| Each additional valid intent and slot combination | - |
4.3. Intent Response (Design)
A good user experience for a skill depends on the skill having well-designed text-to-speech responses. Alexa Voice Design Guide: What Alexa Says provides recommendations for designing your skill’s responses. This test verifies that your skill’s responses meet these recommendations.
You can use the same set of intent and slot combinations used for the Intent and Slot Combinations test.
| Test | Expected Results | |
|---|---|---|
1. |
Test the skill’s intent responses using different combinations of slot values. You can use one of the one-shot phrases for starting the skill, for example:
Be sure to invoke every intent, not just those that are typically used in a one-shot manner. Try a variety of sample utterances for each intent. If the skill vocalizes any examples for users to try, use those examples exactly as instructed by the skill. Evaluate the response for each intent |
The response meets each of the following requirements:
For a better user experience, the response should also meet these recommendations:
|
You can use the same set of intent and slot combinations used for the Intent Response (Intent and Slot Combinations) test.
4.4. Supportive Prompting
A user can begin an interaction with your skill without providing enough information to know what they want to do. This might be either a no intent request (the user invokes the skill but does not specify any intent at all) or a partial intent request (the user specifies the intent but does not provide the slot values necessary to fulfill the request).
In these cases, the skill must provide supportive prompts asking the user what they want to do. This test verifies that your skill provides useful prompts for these scenarios.
| Test | Expected Results | |
|---|---|---|
1. |
Invoke the skill with no intent. You can do this by using a phrase that sends a
Verify that you get a prompt, then respond to the prompt and verify that you get a correct response. |
|
2. |
Invoke the skill with a partial intent. You can do this by using a phrase that invokes the intent without including all the required slot data. For example:
Verify that you get a prompt, then respond to the prompt and verify that you get a correct response. If the skill does not define any slots, you can skip this test, as it is not possible to send a partial intent. |
|
LaunchRequest with no intent) with a fact about space, then ends the session. For these skills, do the first test and verify that you get a complete response.See What Alexa Says for recommendations for designing prompts.
4.5. Invocation Name
Users say the invocation name for a skill to begin an interaction. Inspect the skill’s invocation name and verify that it meets the invocation name requirements described in Choosing the Invocation Name for a Custom Skill.
4.6. One-Shot Phrasing for Sample Utterances
Most skills provide quick, simple, “one-shot” interactions in which the user asks a question or gives a command, the skill responds with an answer or confirmation, and the interaction is complete. In these interactions, the user invokes your skill and states their intent all in a single phrase.
The ask and tell phrases are the most natural phrases for starting these types of interactions. Therefore, it is critical that you write sample utterances that work well with these phrases and are easy and natural to say.
In these tests, you review the sample utterances you’ve written for the skill, then test them by voice to ensure that they work as expected.
| Test | Expected Results | |
|---|---|---|
1. |
Inspect the skill’s sample utterances to ensure that they contain the right phrasing to match the different phrases for invoking a skill with a specific intent. Noun phrases: phrases that can follow
Questions, in both interrogative and inverted forms: phrases that can follow “ask <invocation name> …”
Commands: phrases that can follow “tell <invocation name> to…” or “ask <invocation name> to…”
(In the examples above, the italic phrase is the sample utterance). |
|
2. |
Launch the skill using each of the following common “ask” patterns (ideally do multiple variations for each pattern):
|
|
3. |
Launch the skill with the generic “ask” pattern (recommended test if this is a natural phrase for your skill):
Test with questions starting with different question words (who, what, how, and so on). The specific question words that sound natural with your skill may vary. For example, these types of questions do not flow well with “Space Geek.” A user is unlikely to say something like “Ask Space Geek what is a space fact?” |
|
4. |
Launch the skill using the following common “tell” pattern:
|
|
5. |
Review the “Invoking a Skill with a Specific Request (Intent)” section in Understanding How Users Invoke Custom Skills and test as many of the phrases as apply to your skill. Note that not all of the phrases apply to all skills. For example, the “Ask…whether…” phrasing would probably not make sense for a skill asking about weather or tide information, so the skill would still pass this test even without this phrase. |
|
4.7. Variety of Sample Utterances
Given the flexibility and variation of spoken language in the real world, there will often be many different ways to express the same request. Therefore, your sample utterances must include multiple ways to phrase the same intent.
In this test, inspect the sample utterances for all intents, not just the “one shot” intents described in One-Shot Phrasing for Sample Utterances.
| Test | Expected Results | |
|---|---|---|
1. |
Inspect the skill’s intent schema and sample utterances:
|
The five most common synonyms for phrase patterns are present. For example, if the skill contains “get me <some value>”, then the utterances include synonyms such as “give me <some value>”, “tell me <some value>”, and so on. Each sample utterance must be unique. There cannot be any duplicate sample utterances mapped to different intents. Each slot is used only once within a sample utterance. |
4.8. Intents and Slot Types
Slots are defined with different types. Built-in types such as AMAZON.DATE convert the user’s spoken text into a different format (such as converting the spoken text “march fifth” into the date format “2017-03-05”). Custom slot types are used for items that are not covered by Amazon Alexa’s built-in types.
For this test, review the intent schema and ensure that the correct slot types are used for the type of data the slot is intended to collect.
Note that this test assumes you have migrated to the updated slot types as described in Migrating to the Improved Built-in and Custom Slot Types. If you are still using the previous version (for instance, DATE instead of AMAZON.DATE), then you need to also perform the Sample Utterances (Slot Type Values) test.
AMAZON.LITERAL slot is not being removed as previously described. You can continue to submit new and updated English (US) skills with AMAZON.LITERAL. However, in many cases, custom slot types provide better accuracy than AMAZON.LITERAL, so we recommend that you consider migrating to custom slot types if possible. Note that AMAZON.LITERAL is not supported for any language other than English (US).| Test | Expected Results | |
|---|---|---|
1. |
Inspect the skill’s intent schema to identify all slot types. Verify that the types match the type of data to be collected. |
|
Slot Types:
| Slot Type | Use for slots that collect... |
|---|---|
|
|
Integer numbers. |
|
|
Relative and absolute dates (“this weekend” and “august twenty sixth twenty fifteen”). |
|
|
The time of day (“three thirty p. m.”). |
|
|
A period of time (“five minutes”). |
|
Custom Slot Types |
A value from a list (horoscope signs, all NFL football teams, supported cities, recipe ingredients, and so on). See Custom Slot Types (Values) for additional testing for your custom slot types. |
|
|
Not recommended, consider replacing If your schema does include
|
4.9. Custom Slot Type Values
The custom slot type is used for items that are not covered by Amazon’s built-in types and is recommended for most use cases where a slot value is one of a set of possible values.
| Test | Expected Results | |
|---|---|---|
1. |
Inspect the skill’s intent schema to identify all slots that use custom slot types. For each custom slot type, review the set of values you provided for the type. |
For guidelines for defining custom slot type values, see Recommendations for Custom Slot Type Values. |
4.10. Writing Conventions for Sample Utterances
Sample utterances must be written according to defined rules in order to successfully build a speech model for your skill.
| Test | Expected Results | |
|---|---|---|
1. |
Review the text of all sample utterances. |
All sample utterances adhere to the following
For more information about syntax rules for sample utterances, see the Custom Interaction Model Reference . |
4.11. Error Handling
Unlike a visual interface, where the user can only interact with the objects presented on the screen, there is no way to limit what users can say in a speech interaction. Your skill needs to handle a variety of errors in an intelligent and user-friendly way. This test verifies your skill’s ability to handle common errors.
For more information on validating user input, please see Handling Possible Input Errors.
| Test | Expected Results | |
|---|---|---|
1. |
Invoke the skill without specifying an intent, for example:
When prompted to respond, say nothing. |
Note that in this scenario, the prompt you hear is the re-prompt included in the previous response. |
2. |
Invoke the skill using the following phrase:
When prompted to respond, say something that matches one of your skill’s intents, but with invalid slot data. For instance, if the intent expects an Repeat this test for each slot. |
Note that in this scenario, the prompt is not the re-prompt included in the previous response. This prompt must come from error handling within the code that handles the intent. |
4.12. Providing Help
A skill must have a help intent that can provide additional instructions for navigating and using the skill. Implement the AMAZON.HelpIntent to provide this. You do not need to provide your own sample utterances for this intent, but you do need to implement it in the code for your skill. For details, see Implementing the Built-in Intents.
This test verifies that this intent exists and provides useful information.
| Test | Expected Results | |
|---|---|---|
1. |
Invoke the skill without specifying an intent, for example:
When prompted to respond, say “help”. For a simple skill that gives a complete response even with no specific intent, (such as the Space Geek sample), invoke the help intent directly:
|
The help response:
|
For more about designing help for your skill, see What Alexa Says.
4.13. Stopping and Canceling
Your skill must respond appropriately to common utterances for stopping and canceling actions (such as “stop,” “cancel,” “never mind,” and others). The built-in AMAZON.StopIntent and AMAZON.CancelIntent intents provide these utterances. In most cases, these intents should just exit the skill, but you can map them to alternate functionality if it makes sense for your particular skill. See Implementing the Built-in Intents.
| Test | Expected Results | |
|---|---|---|
1. |
Start the skill and invoke an intent that prompts the user for a response. After hearing the prompt, say “stop.” |
One of the following occurs:
If the skill responds to all requests with a complete response and never provides a prompt, skip this test. |
2. |
Invoke an intent that responds with lengthy text-to-speech. As soon as Alexa begins speaking the response, say “Alexa, stop” to interrupt the response. |
After the wake word interrupts Alexa, one of the following occurs.
If all of the skill’s responses are too short to reasonably interrupt, skip this test. |
3. |
Start the skill and invoke an intent that prompts the user for a response. After hearing the prompt, say “cancel.” |
One of the following occurs:
If the skill responds to all requests with a complete response and never provides a prompt, skip this test. |
4. |
Invoke an intent that responds with lengthy text-to-speech. As soon as Alexa begins speaking the response, say “Alexa, cancel” to interrupt the response. |
After the wake word interrupts Alexa, one of the following occurs.
If all of the skill’s responses are too short to reasonably interrupt, skip this test. |
5. |
Invoke any intent that starts the skill session. While the session is open, say “Exit.” This ends the session and sends your skill a |
The skill closes without returning an error response. |
Appendix: Deprecated Test for Sample Utterances (Slot Type Values)
If all of your slots use the newer slot types with the AMAZON namespace (such as AMAZON.DATE), you do not need to do this test.
In previous versions of the Alexa Skills Kit, it was necessary to include slot values showing different ways of phrasing the slot data in your sample utterances. For example, sample utterances for a DATE slot were written like this:
OneshotTideIntent when is high tide on {january first|Date}
OneshotTideIntent when is high tide {tomorrow|Date}
OneshotTideIntent when is high tide {saturday|Date}
...(many more utterances showing different ways to say the date)
If your skill still uses this syntax for the built-in slot types, you need to review the sample slot values in your sample utterances. We strongly recommend migrating to the updated slot types that no longer require the sample values.
| Test | Expected Results | |
|---|---|---|
1. |
Inspect the skill’s intent schema to identify all slot types, then inspect the slot type values found in the sample utterances. Verify that the slot type values provide sufficient variety for good recognition. |
|