Re: An early draft of a speech API

On 3/15/2011 5:11 PM, Olli Pettay wrote:
> On 03/15/2011 09:57 PM, Young, Milan wrote:
>> I agree with Robert that the Mozilla proposal doesn't feel very
>> "open".  I'd further suggest that the Google speech proposal has
>> similar properties.
>>
>> In both cases, there is a tight coupling between the browser and
>> speech service that is outside of W3C and IETF turf.  This closed
>> model has all of the usual implications such as:
>> * A cross-product of
>> integrations across UA and SS
> If Nuance has a public web based speech service and it exposes
> the API for it, browsers
> could use it as a default speech engine when the device is online.
> Or browsers could use some other engine.

We need the same API for both local and remote speech recognition engines. If 
you want to see the kind of things people are doing today speech recognition 
APIs take a look at vocola, and dragonfly

http://vocola.net/
http://code.google.com/p/dragonfly/

These are two toolkits in very heavy use within the technically capable speech 
recognition community. The nuance Visual Basic toolkit has a vanishingly small 
pickup because it can't do the kind of things we need. I find it ironic that you 
can't write Visual Basic code using NaturallySpeaking yet nuance expects its 
customers to use it to enhance speech recognition interfaces.  at least with the 
technical community managed to build a tool which lets folks create Python, 
Java, and I think JavaScript. It's called voicecode.

http://sourceforge.net/projects/voicecode/

Whatever you do for API, we have a demonstrated need for to support projects of 
a level of complexity comparable to voice code.  simple stuff won't cut it.

If you want to hear about the politics of the underlying natlink API, e-mail me 
directly.

A second issue with the APIs is I don't see any mechanism for local per user 
customization of the speech user interface. I've raised this in the context of 
accessibility but it's also a valid concern for third-party vendors who come up 
with a better way to implement or expand an interface for application.  this 
capability is essential for the speech recognition dependent disabled and 
important to the third-party product community.

There's a lot of prior experience out there, you just need to ask. :-)

--- eric

Received on Wednesday, 16 March 2011 01:17:13 UTC