Re: An early draft of a speech API from Olli Pettay on 2011-03-16 (public-xg-htmlspeech@w3.org from March 2011)

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Wed, 16 Mar 2011 15:37:42 +0200
To: "Eric S. Johansson" <esj@harvee.org>
CC: public-xg-htmlspeech@w3.org
Message-ID: <4D80BD26.8090004@helsinki.fi>
On 03/16/2011 03:16 AM, Eric S. Johansson wrote:
> On 3/15/2011 5:11 PM, Olli Pettay wrote:
>> On 03/15/2011 09:57 PM, Young, Milan wrote:
>>> I agree with Robert that the Mozilla proposal doesn't feel very
>>> "open". I'd further suggest that the Google speech proposal has
>>> similar properties.
>>>
>>> In both cases, there is a tight coupling between the browser and
>>> speech service that is outside of W3C and IETF turf. This closed
>>> model has all of the usual implications such as:
>>> * A cross-product of
>>> integrations across UA and SS
>> If Nuance has a public web based speech service and it exposes
>> the API for it, browsers
>> could use it as a default speech engine when the device is online.
>> Or browsers could use some other engine.
>
> We need the same API for both local and remote speech recognition
> engines.
I was talking about the API between browser and some speech engine,
not the API which web developers would use.


  If you want to see the kind of things people are doing today
> speech recognition APIs take a look at vocola, and dragonfly
>
> http://vocola.net/
> http://code.google.com/p/dragonfly/
>
> These are two toolkits in very heavy use within the technically capable
> speech recognition community.

Web platform provides already a lot of what
for example Dragonfly seem to handle, like Action and Window packages.
The becoming API will handle Grammar (this is largely just W3C SRGS
and SISR) and Engine packages. It is then up to the web application
to do whatever it wants with the recognition result.

Vocola is a language. It seems to be largely an
alternative to W3C SRGS and SISR + it has some functionality
which web platform already provides.
It has also things like ShellExecute, which of course won't be
provided in an API which can be used in web pages.


It looks like the target audience of Dragonfly and Vocola
is quite different than what we have for HTML Speech.
"Vocola is a voice command language—a language for creating commands to 
control a computer by voice."
We're not trying to create an API to control computer by voice,
only web apps, and only if web app itself uses the API.
And Web platform has already rich API which
can and should be utilized.



> The nuance Visual Basic toolkit has a
> vanishingly small pickup because it can't do the kind of things we need.
> I find it ironic that you can't write Visual Basic code using
> NaturallySpeaking yet nuance expects its customers to use it to enhance
> speech recognition interfaces. at least with the technical community
> managed to build a tool which lets folks create Python, Java, and I
> think JavaScript. It's called voicecode.
>
> http://sourceforge.net/projects/voicecode/
>
> Whatever you do for API, we have a demonstrated need for to support
> projects of a level of complexity comparable to voice code.
What do you mean with this?

> simple stuff
> won't cut it.
>
> If you want to hear about the politics of the underlying natlink API,
> e-mail me directly.
>
> A second issue with the APIs is I don't see any mechanism for local per
> user customization of the speech user interface.
IMO, it is up to the UA to allow or disallow
user specific local customization for web app user interface (whether
it is graphical or speech user interface). None of the APIs
disallows or disables that. One could use Greasemonkey scripts or
similar to customize the UI of some particular page.


> I've raised this in the
> context of accessibility but it's also a valid concern for third-party
> vendors who come up with a better way to implement or expand an
> interface for application.
What case do you have in mind? Some other vendor to provide speech UI
for Gmail for example? That would be possible with browser addons or
Greasemonkey scripts. But IMO, not really something this group
should focus on.


-Olli


> this capability is essential for the speech
> recognition dependent disabled and important to the third-party product
> community.
>
> There's a lot of prior experience out there, you just need to ask. :-)
>
> --- eric
>
>
Received on Wednesday, 16 March 2011 13:38:18 UTC