Re: An early draft of a speech API from Olli Pettay on 2011-03-14 (public-xg-htmlspeech@w3.org from March 2011)

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Mon, 14 Mar 2011 21:35:58 +0200
To: Robert Brown <Robert.Brown@microsoft.com>
CC: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <4D7E6E1E.6000607@helsinki.fi>
On 03/14/2011 08:43 PM, Robert Brown wrote:
> (I want to separate this feedback from the API design feedback, since
> they feel like different topics).
>
> You already know I disagree with your "v2" notion.  I'll try to
> describe why I feel so strongly about this.  Here are the
> implications I see to a "v1" of everything you've marked as "only in
> v2":
>
> 1.	Each browser manufacturer would have its own proprietary
> interaction with particular speech engines.  I'm predicting this
> would mean: i) Chrome uses Google's cloud service; ii) IE uses
> Microsoft's on-device and cloud service; iii) Firefox and Opera both
> use. I don't know. whatever's already on the device?... a special
> licensing deal they cut with a speech vendor?
>
> 2.	Many speech innovators who have large and successful customer
> bases will be left out in the cold.  Nuance (for example) has a
> thriving business and great brand based on the fact that they have
> world-class technology that their customers buy because it suits
> their application needs better than any other vendor's.  But in the
> scheme proposed here, Nuance is excluded from developing HTML apps,
> and so are their customers.  This damages a lot of users, not to
> mention excludes a lot of world class speech innovation from the web.
> How does Nuance get back into this game?  Build their own browser?
> Sign deals with all the major browser manufacturers?  Whatever the
> answer, it's not good.  Replace the word "Nuance" with any other
> speech vendor, some of whom are also participating in the XG, and
> it's the same story.  If that's not sad enough, imagine a research
> institution or startup.  What are they supposed to do?

How is co-operating with a browser vendor a bad thing?
IBM used to provide speech engines for Opera.

>
> 3.	Take a look at all the popular speech apps on smart phones these
> days.  None of these could be built.  For example: Google&  Microsoft
> have search apps that deliver great results using huge proprietary
> SLMs, too big to download, and with too much secret sauce to want to
> make public.  For example: Siri's virtual assistant (now owned by
> Apple) is very cool, and is powered by Nuance SR using specifically
> modeled SLMs (which I'm pretty sure have enough IP that they don't
> want leaving their secure servers).  There are plenty of other
> examples.  But the point is that none of today's popular mobile
> speech apps can be built with your "v1" feature set.  So what does
> that leave?

This is the reason for v2. As I've said, we could develop v1 and v2
simultaneously, but since v1 would be hopefully simpler, it could
be implemented first (somewhat similarly to XMLHttpRequest v1 and v2).
In this case one reason for v1 API would be to get feedback from
web developers asap.
Also, we need to be able support non-network engines. Otherwise
offline webapps couldn't use any ASR/TTS.


I could think of speech vendors to implement also the browser side of
the API. That way even v1 could use their engines - maybe just some
basic version. And then with network speech engines (v2) they could sell
the services to web sites which want to use higher quality engines.
If several speech vendors want to provide default engines, they
could design some API which browsers could internally use and then user
or browser could pick up the engine which happens to work best
for them. Default speech engine selection would become close to
selecting default search engine.

And still to clarify - I'm not against network speech engines.
I just want something to be implemented rather soon to get feedback and
experience with the API. And also to give time for the possible
protocol design (although it is possible that the protocol will be just
something on top of XHR or WebSockets).




>
> This just doesn't feel like an "open" standard to me.


I don't know what is not "open".


-Olli



>
> -----Original Message----- From: public-xg-htmlspeech-request@w3.org
> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Olli
> Pettay Sent: Monday, February 28, 2011 12:38 PM To:
> public-xg-htmlspeech@w3.org Subject: An early draft of a speech API
>
> Hi all,
>
> here is what I had in mind for speech API. The text misses still lots
> of definitions, but I hope it is still somewhat clear how it should
> work. (Getting Firefox 4 done has taken most of my time.)
>
> The main difference to the Google's API is that this isn't based on
> elements, but requests objects.
>
> For TTS we could probably use something close to what Björn just
> proposed.
>
>
>
> -Olli
>
Received on Monday, 14 March 2011 19:36:35 UTC