Re: An early draft of a speech API from Bjorn Bringert on 2011-03-17 (public-xg-htmlspeech@w3.org from March 2011)

From: Bjorn Bringert <bringert@google.com>
Date: Thu, 17 Mar 2011 08:38:32 +0000
To: "Young, Milan" <Milan.Young@nuance.com>
Cc: Robert Brown <Robert.Brown@microsoft.com>, Satish Sampath <satish@google.com>, Olli@pettay.fi, Olli Pettay <Olli.Pettay@helsinki.fi>, public-xg-htmlspeech@w3.org
Message-ID: <AANLkTi=akPyaXfkMLtfQk5WEHCaLT4NfrdKDwWvY4s5f@mail.gmail.com>
> In my mind, the methodology used for a default or local speech service
> should be similar to the network case.  We've heard from at least a couple
> others on this list that feel the same.

I initially argued for a single API, but I don't think that's a good
idea anymore. They are two different use cases, for two different sets
of web developers.

- The API for default speech services needs to be a lowest common
denominator. An easy-to-use API is important, since this is the API
for the millions of web developers who want to build simple speech
apps and can't run their own speech services. This API should be
speech-specific to make it easy to design, implement and use.

- The API for app-selected speech services needs to be flexible enough
to allow for all kinds of service-specific extra features. This API is
for sophisticated developers who have the expertise and resources to
tune language models and run or contract their own speech services.
For maximum flexibility, this API should consist of general components
that can be composed in different ways, e.g. a microphone capture API
and an audio streaming API, and ideally not be speech-specific at all.

A single API would complicate the simple API, while restricting the
flexibility of the API for app-selected speech services.

Two different APIs means two simpler APIs that can be developed in parallel.

The only argument that I can see for having a single API is that it
makes it easier to switch between default and app-selected speech
services. But the point of app-specified speech services is that they
are for apps that need tight control over the implementation, which
makes them unlikely to work with a default speech service. Assembly
language and Javascript are different, and merging them into a single
language because you want to write both high-level and low-level code
seems like a bad idea.

/Bjorn

On Wed, Mar 16, 2011 at 10:35 PM, Young, Milan <Milan.Young@nuance.com> wrote:
> In my mind, the methodology used for a default or local speech service should be similar to the network case.  We've heard from at least a couple others on this list that feel the same.
>
> Separating these efforts into v1/v2 or across WGs will almost surely result in divergent/competing APIs.  I'd rather invest in the ounce of prevention.
>
>
>
> -----Original Message-----
> From: Robert Brown [mailto:Robert.Brown@microsoft.com]
> Sent: Wednesday, March 16, 2011 3:16 PM
> To: Satish Sampath; Olli@pettay.fi
> Cc: Olli Pettay; Young, Milan; public-xg-htmlspeech@w3.org; Bjorn Bringert
> Subject: RE: An early draft of a speech API
>
>>> Since all 3 proposals address the default recognizer case without any external dependencies, I think it would be ideal to finalise a concrete recommendation for that without getting blocked on remote recognizers.
>
> Sorry Satish, I disagree.  Unfortunately, we all agree on the least valuable part.  There's no point recommending that.  We should either commit to solving the valuable problems, or defer the work for another year while the microphone and network specs iron out.
>
> -----Original Message-----
> From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Satish Sampath
> Sent: Wednesday, March 16, 2011 3:06 PM
> To: Olli@pettay.fi
> Cc: Olli Pettay; Young, Milan; Robert Brown; public-xg-htmlspeech@w3.org; Bjorn Bringert
> Subject: Re: An early draft of a speech API
>
> There is a good momentum behind the recent WHATWG proposal update for real time communication at http://www.whatwg.org/specs/web-apps/current-work/multipage/dnd.html#video-conferencing-and-peer-to-peer-communication.
> The previous <device> tag version of this proposal was already being prototyped and implemented by various vendors in the browser space.
> Notably, Opera released a prototype recently at http://my.opera.com/core/blog/2011/03/14/web-meet-device and Ericsson Labs showed a prototype in webkit at https://labs.ericsson.com/developer-community/blog/beyond-html5-implementing-device-and-stream-management-webkit.
>
> The fact that browser vendors are getting involved in this spec proposal should encourage our XG to build upon this spec for the remote recognizer use cases. I think this would be better than the DAP device API which browser vendors have not picked up. However this proposal is still a moving target and will likely evolve quickly.
>
> Since all 3 proposals address the default recognizer case without any external dependencies, I think it would be ideal to finalise a concrete recommendation for that without getting blocked on remote recognizers. That will allow browser vendors to implement the default recognizers without having to wait for implementations to pick up the DAP or WHATWG proposal for the audio capture part. We should of course work on the remote recognizer proposal in parallel, but I don't see why it should be a reason to gate a proposal for the simpler use case with the default recognizer.
>
> --
> Cheers
> Satish
>
>
>



-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
Received on Thursday, 17 March 2011 08:39:05 UTC