W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > January 2011

questionnaire results and my recommendations (v2)

From: Dan Burnett <dburnett@voxeo.com>
Date: Tue, 25 Jan 2011 08:38:29 -0500
Message-Id: <278AAD82-444D-41A8-A077-622D14D3FCE6@voxeo.com>
To: public-xg-htmlspeech@w3.org
Group,

The questionnaire is now closed.  Looking at the results [1] and then  
sorting by number of votes I see the following counts:

10 votes for 10 requirements
9 votes for 20 requirements
8 votes for 11 requirements
7 votes for 6 requirements
6 votes for 5 requirements
5 votes for 3 requirements
4 votes for 1 requirement
3 votes for 3 requirements
2 votes for 2 requirements

Based on natural breakpoints and the fact that 5 votes is the halfway  
point out of 10, I would suggest 8-10 votes represent "strong  
interest" in the requirement, 5-7 votes represent "moderate interest"  
in the requirement, and 0-4 votes represent "mild interest" in the  
requirement.  The requirements are listed in these categories at the  
end of this email.


We can discuss and debate names for these different levels the next  
time we have a call, but it seems to me that we can at least conclude  
the following:

1) Proposals that fail to support the "strong interest" requirements  
are unlikely to gain consensus.  Practically, then, as a group we are  
likely to require any proposal to support these requirements.

2) Proposals that support more of the "moderate interest" requirements  
are more likely to gain consensus.  Thus, it would be wise for  
proposals to support as many of these as possible.

3) Gaining consensus to support any of the "mild interest"  
requirements will be difficult at best.


I recommend that we add a section to the requirements document that  
references the questionnaire results and lists the requirements  
grouped into these three different categories.  If there is  
disagreement on the names of the categories we can discuss that on a  
call.

At this point I believe we are ready to consider proposals.  If you  
disagree, please send email to the list and we can discuss.









"Strong Interest" Requirements
- FPR40. Web applications must be able to use barge-in (interrupting  
audio and TTS output when the user starts speaking).
- FPR4. It should be possible for the web application to get the  
recognition results in a standard format such as EMMA.
- FPR24. The web app should be notified when recognition results are  
available.
- FPR50. Web applications must not be prevented from integrating input  
from multiple modalities.
- FPR59. While capture is happening, there must be a way for the web  
application to abort the capture and recognition process.
- FPR52. The web app should be notified when TTS playback finishes.
- FPR60. Web application must be able to programatically abort tts  
output.
- FPR38. Web application must be able to specify language of  
recognition.
- FPR45. Applications should be able to specify the grammars (or lack  
thereof) separately for each recognition.
- FPR1. Web applications must not capture audio without the user's  
consent.
- FPR19. User-initiated speech input should be possible.
- FPR21. The web app should be notified that capture starts.
- FPR22. The web app should be notified that speech is considered to  
have started for the purposes of recognition.
- FPR23. The web app should be notified that speech is considered to  
have ended for the purposes of recognition.
- FPR25. Implementations should be allowed to start processing  
captured audio before the capture completes.
- FPR26. The API to do recognition should not introduce unneeded  
latency.
- FPR34. Web application must be able to specify domain specific  
custom grammars.
- FPR35. Web application must be notified when speech recognition  
errors or non-matches occur.
- FPR42. It should be possible for user agents to allow hands-free  
speech input.
- FPR48. Web application author must be able to specify a domain  
specific statistical language model.
- FPR54. Web apps should be able to customize all aspects of the user  
interface for speech recognition, except where such customizations  
conflict with security and privacy requirements in this document, or  
where they cause other security or privacy problems.
- FPR51. The web app should be notified when TTS playback starts.
- FPR53. The web app should be notified when the audio corresponding  
to a TTS <mark> element is played back.
- FPR5. It should be easy for the web appls to get access to the most  
common pieces of recognition results such as utterance, confidence,  
and nbests.
- FPR39. Web application must be able to be notified when the selected  
language is not available.
- FPR13. It should be easy to assign recognition results to a single  
input field.
- FPR14. It should not be required to fill an input field every time  
there is a recognition result.
- FPR15. It should be possible to use recognition results to multiple  
input fields.
- FPR16. User consent should be informed consent.
- FPR18. It must be possible for the user to revoke consent.
- FPR11. If the web apps specify speech services, it should be  
possible to specify parameters.
- FPR12. Speech services that can be specified by web apps must  
include network speech services.
- FPR2. Implementations must support the XML format of SRGS and must  
support SISR.
- FPR27. Speech recognition implementations should be allowed to add  
implementation specific information to speech recognition results.
- FPR3. Implementation must support SSML.
- FPR46. Web apps should be able to specify which voice is used for TTS.
- FPR7. Web apps should be able to request speech service different  
from default.
- FPR9. If browser refuses to use the web application requested speech  
service, it must inform the web app.
- FPR17. While capture is happening, there must be an obvious way for  
the user to abort the capture and recognition process.
- FPR37. Web application should be given captured audio access only  
after explicit consent from the user.
- FPR49. End users need a clear indication whenever microphone is  
listening to the user

"Moderate Interest" Requirements
- FPR33. There should be at least one mandatory-to-support codec that  
isn't encumbered with IP issues and has sufficient fidelity & low  
bandwidth requirements.
- FPR28. Speech recognition implementations should be allowed to fire  
implementation specific events.
- FPR41. It should be easy to extend the standard without affecting  
existing speech applications.
- FPR36. User agents must provide a default interface to control  
speech recognition.
- FPR44. Recognition without specifying a grammar should be possible.
- FPR61. Aborting the TTS output should be efficient.
- FPR32. Speech services that can be specified by web apps must  
include local speech services.
- FPR47. When speech input is used to provide input to a web app, it  
should be possible for the user to select alternative input methods.
- FPR56. Web applications must be able to request NL interpretation  
based only on text input (no audio sent).
- FPR30. Web applications must be allowed at least one form of  
communication with a particular speech service that is supported in  
all UAs.
- FPR55. Web application must be able to encrypt communications to  
remote speech service.
- FPR58. Web application and speech services must have a means of  
binding session information to communications.
- FPR6. Browser must provide default speech resource.
- FPR20. The spec should not unnecessarily restrict the UA's choice in  
privacy policy.

"Mild Interest" Requirements
- FPR29. Speech synthesis implementations should be allowed to fire  
implementation specific events.
- FPR31. User agents and speech services may agree to use alternate  
protocols for communication.
- FPR43. User agents should not be required to allow hands-free speech  
input.
- FPR10. If browser uses speech services other than the default one,  
it must inform the user which one(s) it is using.
- FPR8. User agent (browser) can refuse to use requested speech service.
- FPR57. Web applications must be able to request recognition based on  
previously sent audio.



[1] http://www.w3.org/2002/09/wbs/45260/ReqPri02/results
Received on Tuesday, 25 January 2011 13:39:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 25 January 2011 13:39:10 GMT