HTML Speech Incubator Group Agenda

Lyon TPAC, 2 and 4 November 2010

Tuesday, 2 November 2010

Room: Saint Clair 3A (Level 2 -- Saint Clair)

1400-1530 Session 1 -- minutes by Jim Barnett

1400-1415 Welcome and Introductions
- Select minute-takers
1415-1530 Requirements
We will discuss the requirements we have already agreed need live discussion (#29 so far), then continue through the other requirements as described in a recent email and get as far as we can. For convenience the current requirements are listed below.

1530-1600 Break

1600-1800 Session 2 -- minutes by Paolo Baggia

1600-1630 Presentation of Google's API proposal
1630-1800 Requirements (continued)

Thursday, 4 November 2010

Room: Saint Clair 2 (Level 2 -- Saint Clair)

0830-1030 Session 1 -- minutes by Marc Schroeder

Requirements (continued)

1030-1100 Break

1100-1230 Session 2 -- minutes by Raj Tumuluri

1100-1145 Planning/Next steps
- Items to complete: requirements, use cases, proposal
- Time frame: one year (extension possible)
- Email, f2f meetings, teleconferences (times)
- Schedule: For us to determine today
1145-1230 Requirements (continued)

Requirements

The requirements and their ranks were taken directly from the earlier group email.

Rank	STDEV	MEAN	Requirement
1	2.89	4.00	R29. Web application may only listen in response to user action.
2	2.88	4.67	R31. End users, not web application authors, should be the ones to select speech recognition resources.
3	2.51	5.57	R33. User agents need a way to enable end users to grant permission to an application to listen to them.
4	2.51	4.50	R1. Web author needs full control over specification of speech resources.*
5	2.43	4.25	R18. User perceived latency of synthesis must be minimized.*
6	2.38	5.00	R17. User perceived latency of recognition must be minimized.*
7	2.27	4.14	R30. End users should not be forced to store anything about their speech recognition environment in the cloud.
8	2.27	3.14	R28. Web application must not be allowed access to raw audio.
9	2.19	4.75	R16. Web application authors must not be excluded from running their own speech service.*
10	2.15	3.57	R26. There should exist a high quality default speech recognition visual user interface.*
11	2.14	5.71	R23. Speech as an input on any application should be able to be optional.*
12	2.14	4.71	R15. Web application authors must not need to run their own speech service.*
13	2.04	4.14	R22. Web application author wants to provide a consistent user experience across all modalities.*
14	1.99	4.57	R24. End user should be able to use speech in a hands-free mode.*
15	1.89	4.71	R2. Application change from directed input to free form input.*
16	1.81	4.88	R10. Web application authors need to be able to use full SSML features.*
17	1.81	4.13	R11. Web application author must integrate input from multiple modalities.*
18	1.80	3.71	R13. Web application author should have ability to customize speech recognition graphical user interface.*
19	1.77	4.14	R9. Web application author provided synthesis feedback.*
20	1.70	2.29	R19. End user extensions should be available both on desktop and in cloud.
21	1.68	5.86	R32. End users need a clear indication whenever microphone is listening to the user.
22	1.62	5.57	R34. A trust relation is needed between end user and whatever is doing recognition.
23	1.60	2.83	R12. Web application author must be able to specify a domain specific statistical language model.*
24	1.60	3.38	R20. Web author selected TTS service should be available both on device and in the cloud.*
25	1.55	5.00	R25. It should be easy to extend the standard without affecting existing speech applications.*
26	1.51	5.50	R14. Web application authors need a way to specify and effectively create barge-in (interrupt audio and synthesis).*
27	1.13	6.43	R5. Web application must be notified when speech recognition errors and other non-matches occur.*
28	1.11	6.29	R7. Web application must be able to specify domain specific custom grammars.*
29	1.07	6.14	R8. Web application must be able to specify language of recognition.*
30	1.00	6.00	R6. Web application must be provided with full context of recognition.*
31	0.98	1.57	R21. Any public interface for creating extensions should be speakable.
32	0.98	6.43	R4. Web application must be notified when recognition occurs.*
33	0.95	6.29	R3. Ability to bind results to specific input fields.*
34	0.74	6.63	R27. Grammars, TTS, media composition, and recognition results should all use standard formats.*

Last modified: Tue Nov 2 08:55:45 EDT 2010