- From: Eric S. Johansson <esj@harvee.org>
- Date: Tue, 15 Mar 2011 21:16:28 -0400
- To: public-xg-htmlspeech@w3.org
On 3/15/2011 5:11 PM, Olli Pettay wrote: > On 03/15/2011 09:57 PM, Young, Milan wrote: >> I agree with Robert that the Mozilla proposal doesn't feel very >> "open". I'd further suggest that the Google speech proposal has >> similar properties. >> >> In both cases, there is a tight coupling between the browser and >> speech service that is outside of W3C and IETF turf. This closed >> model has all of the usual implications such as: >> * A cross-product of >> integrations across UA and SS > If Nuance has a public web based speech service and it exposes > the API for it, browsers > could use it as a default speech engine when the device is online. > Or browsers could use some other engine. We need the same API for both local and remote speech recognition engines. If you want to see the kind of things people are doing today speech recognition APIs take a look at vocola, and dragonfly http://vocola.net/ http://code.google.com/p/dragonfly/ These are two toolkits in very heavy use within the technically capable speech recognition community. The nuance Visual Basic toolkit has a vanishingly small pickup because it can't do the kind of things we need. I find it ironic that you can't write Visual Basic code using NaturallySpeaking yet nuance expects its customers to use it to enhance speech recognition interfaces. at least with the technical community managed to build a tool which lets folks create Python, Java, and I think JavaScript. It's called voicecode. http://sourceforge.net/projects/voicecode/ Whatever you do for API, we have a demonstrated need for to support projects of a level of complexity comparable to voice code. simple stuff won't cut it. If you want to hear about the politics of the underlying natlink API, e-mail me directly. A second issue with the APIs is I don't see any mechanism for local per user customization of the speech user interface. I've raised this in the context of accessibility but it's also a valid concern for third-party vendors who come up with a better way to implement or expand an interface for application. this capability is essential for the speech recognition dependent disabled and important to the third-party product community. There's a lot of prior experience out there, you just need to ask. :-) --- eric
Received on Wednesday, 16 March 2011 01:17:13 UTC