- From: Satish Sampath <satish@google.com>
- Date: Tue, 7 Dec 2010 12:31:41 +0000
- To: Marc Schroeder <marc.schroeder@dfki.de>
- Cc: Robert Brown <Robert.Brown@microsoft.com>, "Young, Milan" <Milan.Young@nuance.com>, Dave Burke <daveburke@google.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
- Message-ID: <AANLkTi=Ladh-NgYrTu1u+7Yb07GnZbxXCSBskzSSoYDU@mail.gmail.com>
I've been thinking about the various speech input use cases brought up in the recent requirements discussion, in particular the website-specified speech service and the UA <> Speech Service protocol. From what I can see, the Device API spec<http://www.whatwg.org/specs/web-apps/current-work/#devices> by WHATWG addresses them nicely and we should be making use of their work. Here is an example of how a plain 'click-button-to-speak' use case can be implemented using the Device API: <device type="media" onchange="startRecording(this.data)"> <script> function startRecording(stream) { var recorder = stream.record(); // Record for 5 seconds. Ideally this will be replaced with an end-pointer. setTimeout(function() { File audioData = recorder.stop(); var xhr = new XMLHttpRequest(); xhr.open("POST", "http://path-to-your-speech-server", true); xhr.send(audioData); xhr.onreadystatechange = function () { if (xhr.readyState != 4) return; window.alert("You spoke: " + xhr.responseText); } }, 5000); } </script> Some salient points: 1. With the Device API, you can start or stop capturing audio at any time from JavaScript. 2. The audio data is sent to the speech service using the standard XMLHttpRequest object in Javascript. - This allows vendor specific parameters to be sent as part of the POST data or custom headers with the request. - No need to define a new protocol here for the request. 3. The server response comes back via the standard XMLHttpRequest object as well. - Vendors are free to implement their protocol on top of HTTP. - Vendors can provide a JS library which encapsulates all of this for their speech service. - There is enough precedence in this area with the various data APIs. 4. For streaming out audio while recording, there is a ConnectionPeer<http://www.whatwg.org/specs/web-apps/current-work/#connectionpeer> proposal. - This is specifically aimed at real-time use cases such as video chat, video record/upload. Speech input will fit in here well. - Audio, text and images can be sent via the same channel in real time to a server or another peer. - Responses can be received in real time as well, making it easy to implement continuous speech recognition. 5. The code above records for 5 seconds but ideally there would be an end-pointer here. This can either be: - Implemented as part of the Device API (i.e. we should propose it to the WHATWG) or - Implemented in Javascript with raw audio samples. The Audio XG<http://www.w3.org/2005/Incubator/audio/> is defining an API for that. I think Olli Pettay is active in that XG as well and Mozilla has a related Audio Data API in the works. 6. Device and Audio APIs are work-in-progress, so we could suggest requirements to them for enabling our use cases. - For e.g. we can suggest "type=audio" for the Device API. There is a team at Google working on implementing the Device API for Chrome/webkit. -- Cheers Satish
Received on Tuesday, 7 December 2010 12:32:13 UTC