- From: Bjorn Bringert <bringert@google.com>
- Date: Tue, 18 May 2010 09:52:53 +0100
On Tue, May 18, 2010 at 8:02 AM, Anne van Kesteren <annevk at opera.com> wrote: > On Mon, 17 May 2010 15:05:22 +0200, Bjorn Bringert <bringert at google.com> > wrote: >> >> Back in December there was a discussion about web APIs for speech >> recognition and synthesis that saw a decent amount of interest >> >> (http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-December/thread.html#24281). >> Based on that discussion, we would like to propose a simple API for >> speech recognition, using a new <input type="speech"> element. An >> informal spec of the new API, along with some sample apps and use >> cases can be found at: >> >> http://docs.google.com/Doc?docid=0AaYxrITemjbxZGNmZzc5cHpfM2Ryajc5Zmhx&hl=en. >> >> It would be very helpful if you could take a look and share your >> comments. Our next steps will be to implement the current design, get >> some feedback from web developers, continue to tweak, and seek >> standardization as soon it looks mature enough and/or other vendors >> become interested in implementing it. > > I wonder how it relates to the <device> proposal already in the draft. In > theory that supports microphone input too. It would be possible to implement speech recognition on top of a microphone input API. The most obvious approach would be to use <device> to get an audio stream, and send that audio stream to a server (e.g. using WebSockets). The server runs a speech recognizer and returns the results. Advantages of the speech input element: - Web app developers do not need to build and maintain a speech recognition service. - Implementations can choose to use client-side speech recognition. This could give reduced network traffic and latency (but probably also reduced recognition accuracy and language support). Implementations could also use server-side recognition by default, switching to local recognition in offline or low bandwidth situations. - Using a general audio capture API would require APIs for things like audio encoding and audio streaming. Judging from the past results of specifying media features, this may be non-trivial. The speech input element turns all audio processing concerns into implementation details. - Implementations can have special UI treatment for speech input, which may be different from that for general audio capture. Advantages of using a microphone API: - Web app developers get complete control over the quality and features of the speech recognizer. This is a moot point for most developers though, since they do not have the resources to run their own speech recognition service. - Fewer features to implement in browsers (assuming that a microphone API would be added anyway). -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Tuesday, 18 May 2010 01:52:53 UTC