Re: UA <=> SS Protocol from Olli Pettay on 2010-12-09 (public-xg-htmlspeech@w3.org from December 2010)

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Thu, 09 Dec 2010 14:16:21 +0200
To: Satish Sampath <satish@google.com>
CC: Marc Schroeder <marc.schroeder@dfki.de>, Robert Brown <Robert.Brown@microsoft.com>, "Young, Milan" <Milan.Young@nuance.com>, Dave Burke <daveburke@google.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <4D00C895.7080406@helsinki.fi>

On 12/07/2010 02:31 PM, Satish Sampath wrote:
> I've been thinking about the various speech input use cases brought up
> in the recent requirements discussion, in particular the
> website-specified speech service and the UA <> Speech Service protocol.
>  From what I can see, the Device API spec
> <http://www.whatwg.org/specs/web-apps/current-work/#devices> by WHATWG
> addresses them nicely and we should be making use of their work.

AFAIK, Device API is still very unstable, and apparently the
draft doesn't really handle security/privacy at all.


>
> Here is an example of how a plain 'click-button-to-speak' use case can
> be implemented using the Device API:
>
>     <device type="media" onchange="startRecording(this.data)">

Device API is still very unclear here. Should "media" record video or 
audio or both?
>
>     <script>
>     function startRecording(stream) {
>        var recorder = stream.record();
>        // Record for 5 seconds. Ideally this will be replaced with an
>     end-pointer.
>        setTimeout(function() {
>          File audioData = recorder.stop();
>          var xhr = new XMLHttpRequest();
>          xhr.open("POST", "http://path-to-your-speech-server
>     <http://path-to-your-speech-server/>", true);
>          xhr.send(audioData);
>          xhr.onreadystatechange = function () {
>            if (xhr.readyState != 4) return;
>            window.alert("You spoke: " + xhr.responseText);
>          }
>        }, 5000);
>     }
>     </script>
>
>
> Some salient points:
>
>    1. With the Device API, you can start or stop capturing audio at any
>       time from JavaScript.

Because it doesn't handle security/privacy yet. That is a major problem 
in the API.


>    2. The audio data is sent to the speech service using the standard
>       XMLHttpRequest object in Javascript.
>           * This allows vendor specific parameters to be sent as part of
>             the POST data or custom headers with the request.
>           * No need to define a new protocol here for the request.
>    3. The server response comes back via the standard XMLHttpRequest
>       object as well.
>           * Vendors are free to implement their protocol on top of HTTP.
>           * Vendors can provide a JS library which encapsulates all of
>             this for their speech service.
>           * There is enough precedence in this area with the various
>             data APIs.
>    4. For streaming out audio while recording, there is a ConnectionPeer
>       <http://www.whatwg.org/specs/web-apps/current-work/#connectionpeer> proposal.
>           * This is specifically aimed at real-time use cases such as
>             video chat, video record/upload. Speech input will fit in
>             here well.
>           * Audio, text and images can be sent via the same channel in
>             real time to a server or another peer.
>           * Responses can be received in real time as well, making it
>             easy to implement continuous speech recognition.
>    5. The code above records for 5 seconds but ideally there would be an
>       end-pointer here. This can either be:
>           * Implemented as part of the Device API (i.e. we should
>             propose it to the WHATWG) or
>           * Implemented in Javascript with raw audio samples. The Audio
>             XG <http://www.w3.org/2005/Incubator/audio/> is defining an
>             API for that.
>             I think Olli Pettay is active in that XG as well and Mozilla
>             has a related Audio Data API in the works.

There are just two different proposals for the Audio API, and those
are for audio output only.



But anyway, if Device API becomes good enough, then it could be used
for speech services. Atm it is not very well defined, so it is perhaps
good time to bring up new requirements for it.


-Olli



>    6. Device and Audio APIs are work-in-progress, so we could suggest
>       requirements to them for enabling our use cases.
>           * For e.g. we can suggest "type=audio" for the Device API.
>
> There is a team at Google working on implementing the Device API for
> Chrome/webkit.
>
> --
> Cheers
> Satish

Received on Thursday, 9 December 2010 12:17:00 UTC