Rank |
STDEV |
MEAN |
Requirement |
1 |
2.89 |
4.00 |
R29. Web application may only listen in response to user action. |
2 |
2.88 |
4.67 |
R31. End users, not web application authors, should be the ones to select speech recognition resources. |
3 |
2.51 |
5.57 |
R33. User agents need a way to enable end users to grant permission to an application to listen to them. |
4 |
2.51 |
4.50 |
R1. Web author needs full control over specification of speech resources.* |
5 |
2.43 |
4.25 |
R18. User perceived latency of synthesis must be minimized.* |
6 |
2.38 |
5.00 |
R17. User perceived latency of recognition must be minimized.* |
7 |
2.27 |
4.14 |
R30. End users should not be forced to store anything about their speech recognition environment in the cloud. |
8 |
2.27 |
3.14 |
R28. Web application must not be allowed access to raw audio. |
9 |
2.19 |
4.75 |
R16. Web application authors must not be excluded from running their own speech service.* |
10 |
2.15 |
3.57 |
R26. There should exist a high quality default speech recognition visual user interface.* |
11 |
2.14 |
5.71 |
R23. Speech as an input on any application should be able to be optional.* |
12 |
2.14 |
4.71 |
R15. Web application authors must not need to run their own speech service.* |
13 |
2.04 |
4.14 |
R22. Web application author wants to provide a consistent user experience across all modalities.* |
14 |
1.99 |
4.57 |
R24. End user should be able to use speech in a hands-free mode.* |
15 |
1.89 |
4.71 |
R2. Application change from directed input to free form input.* |
16 |
1.81 |
4.88 |
R10. Web application authors need to be able to use full SSML features.* |
17 |
1.81 |
4.13 |
R11. Web application author must integrate input from multiple modalities.* |
18 |
1.80 |
3.71 |
R13. Web application author should have ability to customize speech recognition graphical user interface.* |
19 |
1.77 |
4.14 |
R9. Web application author provided synthesis feedback.* |
20 |
1.70 |
2.29 |
R19. End user extensions should be available both on desktop and in cloud. |
21 |
1.68 |
5.86 |
R32. End users need a clear indication whenever microphone is listening to the user. |
22 |
1.62 |
5.57 |
R34. A trust relation is needed between end user and whatever is doing recognition. |
23 |
1.60 |
2.83 |
R12. Web application author must be able to specify a domain specific statistical language model.* |
24 |
1.60 |
3.38 |
R20. Web author selected TTS service should be available both on device and in the cloud.* |
25 |
1.55 |
5.00 |
R25. It should be easy to extend the standard without affecting existing speech applications.* |
26 |
1.51 |
5.50 |
R14. Web application authors need a way to specify and effectively create barge-in (interrupt audio and synthesis).* |
27 |
1.13 |
6.43 |
R5. Web application must be notified when speech recognition errors and other non-matches occur.* |
28 |
1.11 |
6.29 |
R7. Web application must be able to specify domain specific custom grammars.* |
29 |
1.07 |
6.14 |
R8. Web application must be able to specify language of recognition.* |
30 |
1.00 |
6.00 |
R6. Web application must be provided with full context of recognition.* |
31 |
0.98 |
1.57 |
R21. Any public interface for creating extensions should be speakable. |
32 |
0.98 |
6.43 |
R4. Web application must be notified when recognition occurs.* |
33 |
0.95 |
6.29 |
R3. Ability to bind results to specific input fields.* |
34 |
0.74 |
6.63 |
R27. Grammars, TTS, media composition, and recognition results should all use standard formats.* |