Re: UA <=> SS Protocol from Bjorn Bringert on 2010-12-07 (public-xg-htmlspeech@w3.org from December 2010)

From: Bjorn Bringert <bringert@google.com>
Date: Tue, 7 Dec 2010 20:11:10 +0000
To: "Young, Milan" <Milan.Young@nuance.com>
Cc: Satish Sampath <satish@google.com>, Marc Schroeder <marc.schroeder@dfki.de>, Robert Brown <Robert.Brown@microsoft.com>, Dave Burke <daveburke@google.com>, public-xg-htmlspeech@w3.org
Message-ID: <AANLkTik_upUCU8VPYizECNwDdDipVW3NdcO1x1eNUhbs@mail.gmail.com>
I think that we might be talking about different things here. The
things that I hope the XG will deliver are:

1. A draft spec of a web app API for using a speech recognizer
provider by the browser, with implementations in several browsers.

2. A draft spec of a web app API for using a speech synthesizer
provider by the browser, with implementations in several browsers.

3. Requirements and change requests to other working groups or
incubator groups to make sure that APIs such as Device, Audio and
XmlHttpRequest work for network speech services. This is completely
independent of 1 and 2. To ensure that the requested features are
sufficient, there should be several demo systems using those APIs for
speech.

I don't really see where a requirement that browsers must support
XmlHttpRequest would fit, except in the XmlHttpRequest spec itself,
which would be a bit tautological. And since there is no standard
network speech services protocol in this model, there is no concept of
vendor extensions. The high-level protocol between the web app and the
speech services are a matter of bilateral agreement, and the low-level
parts are specified by the Device API and friends.

/Bjorn

On Tue, Dec 7, 2010 at 5:34 PM, Young, Milan <Milan.Young@nuance.com> wrote:
> For the sake of discussion, let's say that we were to come to agreement on Satish's proposal...
>
> Our recommendation would still need to state that all Speech Services and UAs must support XmlHttpRequest as a means of communication.  And to encourage interoperability, the recommendation should also outline the lifecycle of such communication noting where vendor extensions would or would not be appropriate.
>
> Agreed?
>
>
> -----Original Message-----
> From: Bjorn Bringert [mailto:bringert@google.com]
> Sent: Tuesday, December 07, 2010 8:08 AM
> To: Young, Milan
> Cc: Satish Sampath; Marc Schroeder; Robert Brown; Dave Burke; public-xg-htmlspeech@w3.org
> Subject: Re: UA <=> SS Protocol
>
> I think Satish raises a very good point. Given this, I don't actually
> see the need for any standard network speech service protocol at all.
> The protocol idea that we have discussed is only needed for browsers
> to talk to a given network recognizer. Using general APIs such as XHR,
> Device and Audio means that the web app talks directly to the speech
> service, and the browser plays no intelligent part. Thus there is no
> need to specify a standard protocol.
>
> The work that would remain for the HTML Speech XG is to:
>
> 1. Specify a speech-specific API for web apps to use speech services
> provided by the browser. This is still needed for all web developers
> who do not have the ability to run their own speech services.
>
> 2. Collect requirements and change requests for the Device and Audio
> APIs to allow them to be used for network speech services.
>
> /Bjorn
>
> On Tue, Dec 7, 2010 at 3:47 PM, Young, Milan <Milan.Young@nuance.com> wrote:
>> Hello Satish,
>>
>>
>>
>> I'm not familiar with Device APIs, but from what you've written below, this
>> may be a good fit.  Nice to keep this in mind as we talk about protocol
>> requirements because realistically ability to implement plays into what we
>> are willing to require.
>>
>>
>>
>> But before we proceed, we still need to hear from Google on two points:
>>
>> ·         Should our group define the requirements of the protocol?
>>
>> ·         Should our group include a concrete protocol definition in our
>> recommendation?
>>
>>
>>
>> (Note that whether we factor the protocol definition to another
>> specification or contain it in our spec is still TBD.  In my opinion, it
>> should probably remain TBD until we complete the requirements phase.)
>>
>>
>>
>> Can you or some other Google representative please comment?
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> ________________________________
>>
>> From: Satish Sampath [mailto:satish@google.com]
>> Sent: Tuesday, December 07, 2010 4:32 AM
>> To: Marc Schroeder
>> Cc: Robert Brown; Young, Milan; Dave Burke; public-xg-htmlspeech@w3.org
>>
>> Subject: Re: UA <=> SS Protocol
>>
>>
>>
>> I've been thinking about the various speech input use cases brought up in
>> the recent requirements discussion, in particular the website-specified
>> speech service and the UA <> Speech Service protocol. >From what I can see,
>> the Device API spec by WHATWG addresses them nicely and we should be making
>> use of their work.
>>
>>
>>
>> Here is an example of how a plain 'click-button-to-speak' use case can be
>> implemented using the Device API:
>>
>>
>>
>> <device type="media" onchange="startRecording(this.data)">
>>
>>
>>
>> <script>
>>
>> function startRecording(stream) {
>>
>>   var recorder = stream.record();
>>
>>   // Record for 5 seconds. Ideally this will be replaced with an
>> end-pointer.
>>
>>   setTimeout(function() {
>>
>>     File audioData = recorder.stop();
>>
>>     var xhr = new XMLHttpRequest();
>>
>>     xhr.open("POST", "http://path-to-your-speech-server", true);
>>
>>     xhr.send(audioData);
>>
>>     xhr.onreadystatechange = function () {
>>
>>       if (xhr.readyState != 4) return;
>>
>>       window.alert("You spoke: " + xhr.responseText);
>>
>>     }
>>
>>   }, 5000);
>>
>> }
>>
>> </script>
>>
>>
>>
>> Some salient points:
>>
>> 1.   With the Device API, you can start or stop capturing audio at any time
>> from JavaScript.
>>
>> 2.   The audio data is sent to the speech service using the standard
>> XMLHttpRequest object in Javascript.
>>
>> o   This allows vendor specific parameters to be sent as part of the POST
>> data or custom headers with the request.
>>
>> o   No need to define a new protocol here for the request.
>>
>> 3.   The server response comes back via the standard XMLHttpRequest object
>> as well.
>>
>> o   Vendors are free to implement their protocol on top of HTTP.
>>
>> o   Vendors can provide a JS library which encapsulates all of this for
>> their speech service.
>>
>> o   There is enough precedence in this area with the various data APIs.
>>
>> 4.   For streaming out audio while recording, there is
>> a ConnectionPeer proposal.
>>
>> o   This is specifically aimed at real-time use cases such as video chat,
>> video record/upload. Speech input will fit in here well.
>>
>> o   Audio, text and images can be sent via the same channel in real time to
>> a server or another peer.
>>
>> o   Responses can be received in real time as well, making it easy to
>> implement continuous speech recognition.
>>
>> 5.   The code above records for 5 seconds but ideally there would be an
>> end-pointer here. This can either be:
>>
>> o   Implemented as part of the Device API (i.e. we should propose it to the
>> WHATWG) or
>>
>> o   Implemented in Javascript with raw audio samples. The Audio XG is
>> defining an API for that.
>>
>> I think Olli Pettay is active in that XG as well and Mozilla has a related
>> Audio Data API in the works.
>>
>> 6.   Device and Audio APIs are work-in-progress, so we could suggest
>> requirements to them for enabling our use cases.
>>
>> o   For e.g. we can suggest "type=audio" for the Device API.
>>
>> There is a team at Google working on implementing the Device API for
>> Chrome/webkit.
>>
>>
>>
>> --
>>
>> Cheers
>>
>> Satish
>
>
>
> --
> Bjorn Bringert
> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
> Palace Road, London, SW1W 9TQ
> Registered in England Number: 3977902
>



-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
Received on Tuesday, 7 December 2010 20:11:41 UTC