RE: Speech API: first editor's draft posted from Adam Sobieski on 2012-04-20 (public-speech-api@w3.org from April 2012)

From: Adam Sobieski <adamsobieski@hotmail.com>
Date: Fri, 20 Apr 2012 18:41:07 +0000
To: Glen Shires <gshires@google.com>
CC: <public-speech-api@w3.org>, <public-speech-api-contrib@w3.org>
Message-ID: <SNT138-W49CD30E41436C658AF6C3CC5220@phx.gbl>

Glen Shires, I started some discussion threads in the Voice Browser Working Group mailing list and forwarded the note about the use of SpeechRecognitionEvent for multimodal input scenarios in a message describing the hypothetical event-based grammar elements. With regard to the JavaScript Speech API, an argument for either SpeechRecognition/SpeechSynthesis or SpeechRecognizer/SpeechSynthesizer is the scalability of speech synthesis inputs beyond text strings, to XML strings and to document object model nodes, and that speech recognition can output object-based and XML-based semantic interpretations. With the Speech API including document object model node speech synthesis input, the Speech API specification could describe interoperability with CSS3, CSS3 Speech module, HTML5, MathML3, SMIL, and SSML. Regarding topics 5.1.9 (http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#speechreco-speechgrammar) and 5.1.10 (http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#speechreco-speechgrammarlist), a grammar object model or SRGSOM is possible.   Kind regards, Adam  From: gshires@google.com
Date: Tue, 17 Apr 2012 12:45:00 -0700
To: adamsobieski@hotmail.com
CC: milan.young@nuance.com; hwennborg@google.com; public-speech-api@w3.org; public-speech-api-contrib@w3.org
Subject: Re: Speech API: first editor's draft posted

The goal and scope of this Community Group is to produce a JavaScript Speech API. [1]  Extensions to SRGS, SISR and SSML should be considered beyond the scope of this CG, and I agree with Milan that such suggestions should be routed through the Voice Browser Working Group.

Also, note that SpeechRecognitionEvent can return interim results ("hypothesis" events), which could be used for multimodal interactions.
Finally, some options on the naming that we might consider:

- SpeechRecognition / SpeechSynthesis
- SpeechRecognizer / SpeechSynthesizer

- SpeechToText / TextToSpeech
These obviously vary in length and ease-of-spelling. They may also vary in intuitive clarity to those less immersed in speech technology than many of us are (and also to those for whom English is not their native language).

/Glen Shires
[1] http://www.w3.org/community/speech-api/

Received on Friday, 20 April 2012 18:41:39 UTC