RE: An early draft of a speech API from Robert Brown on 2011-03-14 (public-xg-htmlspeech@w3.org from March 2011)

From: Robert Brown <Robert.Brown@microsoft.com>
Date: Mon, 14 Mar 2011 18:43:56 +0000
To: "Olli@pettay.fi" <Olli@pettay.fi>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <113BCF28740AF44989BE7D3F84AE18DD19875207@TK5EX14MBXC118.redmond.corp.microsoft.>

(I want to separate this feedback from the API design feedback, since they feel like different topics).

You already know I disagree with your "v2" notion.  I'll try to describe why I feel so strongly about this.  Here are the implications I see to a "v1" of everything you've marked as "only in v2":

1.	Each browser manufacturer would have its own proprietary interaction with particular speech engines.  I'm predicting this would mean: i) Chrome uses Google's cloud service; ii) IE uses Microsoft's on-device and cloud service; iii) Firefox and Opera both use. I don't know. whatever's already on the device?... a special licensing deal they cut with a speech vendor?

2.	Many speech innovators who have large and successful customer bases will be left out in the cold.  Nuance (for example) has a thriving business and great brand based on the fact that they have world-class technology that their customers buy because it suits their application needs better than any other vendor's.  But in the scheme proposed here, Nuance is excluded from developing HTML apps, and so are their customers.  This damages a lot of users, not to mention excludes a lot of world class speech innovation from the web.  How does Nuance get back into this game?  Build their own browser?  Sign deals with all the major browser manufacturers?  Whatever the answer, it's not good.  Replace the word "Nuance" with any other speech vendor, some of whom are also participating in the XG, and it's the same story.  If that's not sad enough, imagine a research institution or startup.  What are they supposed to do?

3.	Take a look at all the popular speech apps on smart phones these days.  None of these could be built.  For example: Google & Microsoft have search apps that deliver great results using huge proprietary SLMs, too big to download, and with too much secret sauce to want to make public.  For example: Siri's virtual assistant (now owned by Apple) is very cool, and is powered by Nuance SR using specifically modeled SLMs (which I'm pretty sure have enough IP that they don't want leaving their secure servers).  There are plenty of other examples.  But the point is that none of today's popular mobile speech apps can be built with your "v1" feature set.  So what does that leave?

This just doesn't feel like an "open" standard to me.

-----Original Message-----
From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Olli Pettay
Sent: Monday, February 28, 2011 12:38 PM
To: public-xg-htmlspeech@w3.org
Subject: An early draft of a speech API

Hi all,

here is what I had in mind for speech API.
The text misses still lots of definitions, but I hope it is still somewhat clear how it should work.
(Getting Firefox 4 done has taken most of my time.)

The main difference to the Google's API is that this isn't based on elements, but requests objects.

For TTS we could probably use something close to what Björn just proposed.



-Olli

Received on Monday, 14 March 2011 18:44:32 UTC