Requirements discussion from Dan Burnett on 2010-10-22 (public-xg-htmlspeech@w3.org from October 2010)

From: Dan Burnett <dburnett@voxeo.com>
Date: Thu, 21 Oct 2010 20:05:26 -0400
To: public-xg-htmlspeech@w3.org
Message-Id: <DE9B92A4-345A-4D5A-B18B-CDF29ED21496@voxeo.com>
Group,

I've spent some time looking over the results from the survey.  There  
are several different options for discussion order that we could use.   
For the purposes of determining consensus the most useful ordering is  
one that reflects the likely amount of consensus.

The requirements are ordered below by the standard deviation of the  
scores.  The means are also given for reference.  These values include  
the inputs from Olli.


Rank	STDEV	MEAN	Requirement
1	2.89	4.00	R29. Web application may only listen in response to user  
action.
2	2.88	4.67	R31. End users, not web application authors, should be the  
ones to select speech recognition resources.
3	2.51	5.57	R33. User agents need a way to enable end users to grant  
permission to an application to listen to them.
4	2.51	4.50	R1. Web author needs full control over specification of  
speech resources.*
5	2.43	4.25	R18. User perceived latency of synthesis must be minimized.*
6	2.38	5.00	R17. User perceived latency of recognition must be  
minimized.*
7	2.27	4.14	R30. End users should not be forced to store anything  
about their speech recognition environment in the cloud.
8	2.27	3.14	R28. Web application must not be allowed access to raw  
audio.
9	2.19	4.75	R16. Web application authors must not be excluded from  
running their own speech service.*
10	2.15	3.57	R26. There should exist a high quality default speech  
recognition visual user interface.*
11	2.14	5.71	R23. Speech as an input on any application should be able  
to be optional.*
12	2.14	4.71	R15. Web application authors must not need to run their  
own speech service.*
13	2.04	4.14	R22. Web application author wants to provide a consistent  
user experience across all modalities.*
14	1.99	4.57	R24. End user should be able to use speech in a hands- 
free mode.*
15	1.89	4.71	R2. Application change from directed input to free form  
input.*
16	1.81	4.88	R10. Web application authors need to be able to use full  
SSML features.*
17	1.81	4.13	R11. Web application author must integrate input from  
multiple modalities.*
18	1.80	3.71	R13. Web application author should have ability to  
customize speech recognition graphical user interface.*
19	1.77	4.14	R9. Web application author provided synthesis feedback.*
20	1.70	2.29	R19. End user extensions should be available both on  
desktop and in cloud.
21	1.68	5.86	R32. End users need a clear indication whenever  
microphone is listening to the user.
22	1.62	5.57	R34. A trust relation is needed between end user and  
whatever is doing recognition.
23	1.60	2.83	R12. Web application author must be able to specify a  
domain specific statistical language model.*
24	1.60	3.38	R20. Web author selected TTS service should be available  
both on device and in the cloud.*
25	1.55	5.00	R25. It should be easy to extend the standard without  
affecting existing speech applications.*
26	1.51	5.50	R14. Web application authors need a way to specify and  
effectively create barge-in (interrupt audio and synthesis).*
27	1.13	6.43	R5. Web application must be notified when speech  
recognition errors and other non-matches occur.*
28	1.11	6.29	R7. Web application must be able to specify domain  
specific custom grammars.*
29	1.07	6.14	R8. Web application must be able to specify language of  
recognition.*
30	1.00	6.00	R6. Web application must be provided with full context of  
recognition.*
31	0.98	1.57	R21. Any public interface for creating extensions should  
be speakable.
32	0.98	6.43	R4. Web application must be notified when recognition  
occurs.*
33	0.95	6.29	R3. Ability to bind results to specific input fields.*
34	0.74	6.63	R27. Grammars, TTS, media composition, and recognition  
results should all use standard formats.*


I recommend that we address these simultaneously from the top and the  
bottom.  The top ones reflect scores that were not concentrated around  
one value (potentially reflecting strong disagreement about  
importance), while the requirements at the bottom are those with  
scores that were concentrated around one value (potentially reflecting  
general agreement about importance).
With any luck the ones at the bottom will be quick to address and  
prioritize, while the ones at the top are likely the ones that will  
need the most discussion.

I will shortly send follow-on emails for the first two requirements  
using this approach:  27 and 29.  Please reply to each email with any  
questions or comments you have and any points you would like to make.   
After a week we'll see how close we are to consensus on each.  The  
ideal outcome is one of the following two priorities:  1) Should  
Address, 2) For Future Consideration.  Of course, if we need to  
revise, split, or otherwise modify a requirement we'll do so.

-- dan
Received on Friday, 22 October 2010 00:06:00 UTC