W3C home > Mailing lists > Public > whatwg@whatwg.org > December 2009

[whatwg] Web API for speech recognition and synthesis

From: Bjorn Bringert <bringert@google.com>
Date: Fri, 11 Dec 2009 14:05:00 +0000
Message-ID: <1ac456d70912110605i5fdc3f31wfd0370dc26b745f@mail.gmail.com>
Thanks for the discussion - cool to see more interest today also
(http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024453.html)

I've hacked up a proof-of-concept JavaScript API for speech
recognition and synthesis. It adds a navigator.speech object with
these functions:

void listen(ListenCallback callback, ListenOptions options);
void speak(DOMString text, SpeakCallback callback, SpeakOptions options);

The implementation uses an NPAPI plugin for the Android browser that
wraps the existing Android speech APIs. The code is available at
http://code.google.com/p/speech-api-browser-plugin/

There are some simple demo apps in
http://code.google.com/p/speech-api-browser-plugin/source/browse/trunk/android-plugin/demos/
including:

- English to Spanish speech-to-speech translation
- Google search by speaking a query
- The obligatory pizza ordering system
- A phone number dialer

Comments appreciated!

/Bjorn

On Fri, Dec 4, 2009 at 2:51 PM, Olli Pettay <Olli.Pettay at helsinki.fi> wrote:
> Indeed the API should be something significantly simpler than X+V.
> Microsoft has (had?) support for SALT. That API is pretty simple and
> provides speech recognition and TTS.
> The API could be probably even simpler than SALT.
> IIRC, there was an extension for Firefox to support SALT (well, there was
> also an extension to support X+V).
>
> If the platform/OS provides ASR and TTS, adding a JS API for it should
> be pretty simple. X+V tries to handle some logic using VoiceXML FIA, but
> I think it would be more web-like to give pure JS API (similar to SALT).
> Integrating visual and voice input could be done in scripts. I'd assume
> there would be some script libraries to handle multimodal input integration
> - especially if there will be touch and gestures events too etc. (Classic
> multimodal map applications will become possible in web.)
>
> But this all is something which should be possibly designed in or with W3C
> multimodal working group. I know their current architecture is way more
> complex, but X+X, SALT and even Multimodal-CSS has been discussed in that
> working group.
>
>
> -Olli
>
>
>
> On 12/3/09 2:50 AM, Dave Burke wrote:
>>
>> We're envisaging a simpler programmatic API that looks familiar to the
>> modern Web developer but one which avoids the legacy of dialog system
>> languages.
>>
>> Dave
>>
>> On Wed, Dec 2, 2009 at 7:25 PM, Jo?o Eiras <joaoe at opera.com
>> <mailto:joaoe at opera.com>> wrote:
>>
>> ? ?On Wed, 02 Dec 2009 12:32:07 +0100, Bjorn Bringert
>> ? ?<bringert at google.com <mailto:bringert at google.com>> wrote:
>>
>> ? ? ? ?We've been watching our colleagues build native apps that use
>> speech
>> ? ? ? ?recognition and speech synthesis, and would like to have JavaScript
>> ? ? ? ?APIs that let us do the same in web apps. We are thinking about
>> ? ? ? ?creating a lightweight and implementation-independent API that lets
>> ? ? ? ?web apps use speech services. Is anyone else interested in that?
>>
>> ? ? ? ?Bjorn Bringert, David Singleton, Gummi Hafsteinsson
>>
>>
>> ? ?This exists already, but only Opera supports it, although there are
>> ? ?problems with the library we use for speech recognition.
>>
>> ? ?http://www.w3.org/TR/xhtml+voice/
>>
>> ?http://dev.opera.com/articles/view/add-voice-interactivity-to-your-site/
>>
>> ? ?Would be nice to revive that specification and get vendor buy-in.
>>
>>
>>
>> ? ?--
>>
>> ? ?Jo?o Eiras
>> ? ?Core Developer, Opera Software ASA, http://www.opera.com/
>>
>>
>
>



-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
Received on Friday, 11 December 2009 06:05:00 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:54 UTC