| Rank |
STDEV |
MEAN |
Requirement |
| 1 |
2.89 |
4.00 |
R29. Web application may only listen in response to user action. |
| 2 |
2.88 |
4.67 |
R31. End users, not web application authors, should be the ones to select speech recognition resources. |
| 3 |
2.51 |
5.57 |
R33. User agents need a way to enable end users to grant permission to an application to listen to them. |
| 4 |
2.51 |
4.50 |
R1. Web author needs full control over specification of speech resources.* |
| 5 |
2.43 |
4.25 |
R18. User perceived latency of synthesis must be minimized.* |
| 6 |
2.38 |
5.00 |
R17. User perceived latency of recognition must be minimized.* |
| 7 |
2.27 |
4.14 |
R30. End users should not be forced to store anything about their speech recognition environment in the cloud. |
| 8 |
2.27 |
3.14 |
R28. Web application must not be allowed access to raw audio. |
| 9 |
2.19 |
4.75 |
R16. Web application authors must not be excluded from running their own speech service.* |
| 10 |
2.15 |
3.57 |
R26. There should exist a high quality default speech recognition visual user interface.* |
| 11 |
2.14 |
5.71 |
R23. Speech as an input on any application should be able to be optional.* |
| 12 |
2.14 |
4.71 |
R15. Web application authors must not need to run their own speech service.* |
| 13 |
2.04 |
4.14 |
R22. Web application author wants to provide a consistent user experience across all modalities.* |
| 14 |
1.99 |
4.57 |
R24. End user should be able to use speech in a hands-free mode.* |
| 15 |
1.89 |
4.71 |
R2. Application change from directed input to free form input.* |
| 16 |
1.81 |
4.88 |
R10. Web application authors need to be able to use full SSML features.* |
| 17 |
1.81 |
4.13 |
R11. Web application author must integrate input from multiple modalities.* |
| 18 |
1.80 |
3.71 |
R13. Web application author should have ability to customize speech recognition graphical user interface.* |
| 19 |
1.77 |
4.14 |
R9. Web application author provided synthesis feedback.* |
| 20 |
1.70 |
2.29 |
R19. End user extensions should be available both on desktop and in cloud. |
| 21 |
1.68 |
5.86 |
R32. End users need a clear indication whenever microphone is listening to the user. |
| 22 |
1.62 |
5.57 |
R34. A trust relation is needed between end user and whatever is doing recognition. |
| 23 |
1.60 |
2.83 |
R12. Web application author must be able to specify a domain specific statistical language model.* |
| 24 |
1.60 |
3.38 |
R20. Web author selected TTS service should be available both on device and in the cloud.* |
| 25 |
1.55 |
5.00 |
R25. It should be easy to extend the standard without affecting existing speech applications.* |
| 26 |
1.51 |
5.50 |
R14. Web application authors need a way to specify and effectively create barge-in (interrupt audio and synthesis).* |
| 27 |
1.13 |
6.43 |
R5. Web application must be notified when speech recognition errors and other non-matches occur.* |
| 28 |
1.11 |
6.29 |
R7. Web application must be able to specify domain specific custom grammars.* |
| 29 |
1.07 |
6.14 |
R8. Web application must be able to specify language of recognition.* |
| 30 |
1.00 |
6.00 |
R6. Web application must be provided with full context of recognition.* |
| 31 |
0.98 |
1.57 |
R21. Any public interface for creating extensions should be speakable. |
| 32 |
0.98 |
6.43 |
R4. Web application must be notified when recognition occurs.* |
| 33 |
0.95 |
6.29 |
R3. Ability to bind results to specific input fields.* |
| 34 |
0.74 |
6.63 |
R27. Grammars, TTS, media composition, and recognition results should all use standard formats.* |