- From: Bjorn Bringert <bringert@google.com>
- Date: Fri, 11 Dec 2009 14:05:00 +0000
Thanks for the discussion - cool to see more interest today also (http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024453.html) I've hacked up a proof-of-concept JavaScript API for speech recognition and synthesis. It adds a navigator.speech object with these functions: void listen(ListenCallback callback, ListenOptions options); void speak(DOMString text, SpeakCallback callback, SpeakOptions options); The implementation uses an NPAPI plugin for the Android browser that wraps the existing Android speech APIs. The code is available at http://code.google.com/p/speech-api-browser-plugin/ There are some simple demo apps in http://code.google.com/p/speech-api-browser-plugin/source/browse/trunk/android-plugin/demos/ including: - English to Spanish speech-to-speech translation - Google search by speaking a query - The obligatory pizza ordering system - A phone number dialer Comments appreciated! /Bjorn On Fri, Dec 4, 2009 at 2:51 PM, Olli Pettay <Olli.Pettay at helsinki.fi> wrote: > Indeed the API should be something significantly simpler than X+V. > Microsoft has (had?) support for SALT. That API is pretty simple and > provides speech recognition and TTS. > The API could be probably even simpler than SALT. > IIRC, there was an extension for Firefox to support SALT (well, there was > also an extension to support X+V). > > If the platform/OS provides ASR and TTS, adding a JS API for it should > be pretty simple. X+V tries to handle some logic using VoiceXML FIA, but > I think it would be more web-like to give pure JS API (similar to SALT). > Integrating visual and voice input could be done in scripts. I'd assume > there would be some script libraries to handle multimodal input integration > - especially if there will be touch and gestures events too etc. (Classic > multimodal map applications will become possible in web.) > > But this all is something which should be possibly designed in or with W3C > multimodal working group. I know their current architecture is way more > complex, but X+X, SALT and even Multimodal-CSS has been discussed in that > working group. > > > -Olli > > > > On 12/3/09 2:50 AM, Dave Burke wrote: >> >> We're envisaging a simpler programmatic API that looks familiar to the >> modern Web developer but one which avoids the legacy of dialog system >> languages. >> >> Dave >> >> On Wed, Dec 2, 2009 at 7:25 PM, Jo?o Eiras <joaoe at opera.com >> <mailto:joaoe at opera.com>> wrote: >> >> ? ?On Wed, 02 Dec 2009 12:32:07 +0100, Bjorn Bringert >> ? ?<bringert at google.com <mailto:bringert at google.com>> wrote: >> >> ? ? ? ?We've been watching our colleagues build native apps that use >> speech >> ? ? ? ?recognition and speech synthesis, and would like to have JavaScript >> ? ? ? ?APIs that let us do the same in web apps. We are thinking about >> ? ? ? ?creating a lightweight and implementation-independent API that lets >> ? ? ? ?web apps use speech services. Is anyone else interested in that? >> >> ? ? ? ?Bjorn Bringert, David Singleton, Gummi Hafsteinsson >> >> >> ? ?This exists already, but only Opera supports it, although there are >> ? ?problems with the library we use for speech recognition. >> >> ? ?http://www.w3.org/TR/xhtml+voice/ >> >> ?http://dev.opera.com/articles/view/add-voice-interactivity-to-your-site/ >> >> ? ?Would be nice to revive that specification and get vendor buy-in. >> >> >> >> ? ?-- >> >> ? ?Jo?o Eiras >> ? ?Core Developer, Opera Software ASA, http://www.opera.com/ >> >> > > -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Friday, 11 December 2009 06:05:00 UTC