Re: UA <=> SS Protocol from Bjorn Bringert on 2010-12-07 (public-xg-htmlspeech@w3.org from December 2010)

From: Bjorn Bringert <bringert@google.com>
Date: Tue, 7 Dec 2010 16:08:07 +0000
To: "Young, Milan" <Milan.Young@nuance.com>
Cc: Satish Sampath <satish@google.com>, Marc Schroeder <marc.schroeder@dfki.de>, Robert Brown <Robert.Brown@microsoft.com>, Dave Burke <daveburke@google.com>, public-xg-htmlspeech@w3.org
Message-ID: <AANLkTikiJNF=FNa2Q8eWTL4gDbi0LHD=8F7RTG5MDpii@mail.gmail.com>
I think Satish raises a very good point. Given this, I don't actually
see the need for any standard network speech service protocol at all.
The protocol idea that we have discussed is only needed for browsers
to talk to a given network recognizer. Using general APIs such as XHR,
Device and Audio means that the web app talks directly to the speech
service, and the browser plays no intelligent part. Thus there is no
need to specify a standard protocol.

The work that would remain for the HTML Speech XG is to:

1. Specify a speech-specific API for web apps to use speech services
provided by the browser. This is still needed for all web developers
who do not have the ability to run their own speech services.

2. Collect requirements and change requests for the Device and Audio
APIs to allow them to be used for network speech services.

/Bjorn

On Tue, Dec 7, 2010 at 3:47 PM, Young, Milan <Milan.Young@nuance.com> wrote:
> Hello Satish,
>
>
>
> I’m not familiar with Device APIs, but from what you’ve written below, this
> may be a good fit.  Nice to keep this in mind as we talk about protocol
> requirements because realistically ability to implement plays into what we
> are willing to require.
>
>
>
> But before we proceed, we still need to hear from Google on two points:
>
> ·         Should our group define the requirements of the protocol?
>
> ·         Should our group include a concrete protocol definition in our
> recommendation?
>
>
>
> (Note that whether we factor the protocol definition to another
> specification or contain it in our spec is still TBD.  In my opinion, it
> should probably remain TBD until we complete the requirements phase.)
>
>
>
> Can you or some other Google representative please comment?
>
>
>
> Thanks
>
>
>
>
>
> ________________________________
>
> From: Satish Sampath [mailto:satish@google.com]
> Sent: Tuesday, December 07, 2010 4:32 AM
> To: Marc Schroeder
> Cc: Robert Brown; Young, Milan; Dave Burke; public-xg-htmlspeech@w3.org
>
> Subject: Re: UA <=> SS Protocol
>
>
>
> I've been thinking about the various speech input use cases brought up in
> the recent requirements discussion, in particular the website-specified
> speech service and the UA <> Speech Service protocol. >From what I can see,
> the Device API spec by WHATWG addresses them nicely and we should be making
> use of their work.
>
>
>
> Here is an example of how a plain 'click-button-to-speak' use case can be
> implemented using the Device API:
>
>
>
> <device type="media" onchange="startRecording(this.data)">
>
>
>
> <script>
>
> function startRecording(stream) {
>
>   var recorder = stream.record();
>
>   // Record for 5 seconds. Ideally this will be replaced with an
> end-pointer.
>
>   setTimeout(function() {
>
>     File audioData = recorder.stop();
>
>     var xhr = new XMLHttpRequest();
>
>     xhr.open("POST", "http://path-to-your-speech-server", true);
>
>     xhr.send(audioData);
>
>     xhr.onreadystatechange = function () {
>
>       if (xhr.readyState != 4) return;
>
>       window.alert("You spoke: " + xhr.responseText);
>
>     }
>
>   }, 5000);
>
> }
>
> </script>
>
>
>
> Some salient points:
>
> 1.   With the Device API, you can start or stop capturing audio at any time
> from JavaScript.
>
> 2.   The audio data is sent to the speech service using the standard
> XMLHttpRequest object in Javascript.
>
> o   This allows vendor specific parameters to be sent as part of the POST
> data or custom headers with the request.
>
> o   No need to define a new protocol here for the request.
>
> 3.   The server response comes back via the standard XMLHttpRequest object
> as well.
>
> o   Vendors are free to implement their protocol on top of HTTP.
>
> o   Vendors can provide a JS library which encapsulates all of this for
> their speech service.
>
> o   There is enough precedence in this area with the various data APIs.
>
> 4.   For streaming out audio while recording, there is
> a ConnectionPeer proposal.
>
> o   This is specifically aimed at real-time use cases such as video chat,
> video record/upload. Speech input will fit in here well.
>
> o   Audio, text and images can be sent via the same channel in real time to
> a server or another peer.
>
> o   Responses can be received in real time as well, making it easy to
> implement continuous speech recognition.
>
> 5.   The code above records for 5 seconds but ideally there would be an
> end-pointer here. This can either be:
>
> o   Implemented as part of the Device API (i.e. we should propose it to the
> WHATWG) or
>
> o   Implemented in Javascript with raw audio samples. The Audio XG is
> defining an API for that.
>
> I think Olli Pettay is active in that XG as well and Mozilla has a related
> Audio Data API in the works.
>
> 6.   Device and Audio APIs are work-in-progress, so we could suggest
> requirements to them for enabling our use cases.
>
> o   For e.g. we can suggest "type=audio" for the Device API.
>
> There is a team at Google working on implementing the Device API for
> Chrome/webkit.
>
>
>
> --
>
> Cheers
>
> Satish



-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
Received on Tuesday, 7 December 2010 16:08:39 UTC