- From: Bjorn Bringert <bringert@google.com>
- Date: Thu, 3 Dec 2009 12:06:05 +0000
On Wed, Dec 2, 2009 at 10:20 PM, Jonas Sicking <jonas at sicking.cc> wrote: > On Wed, Dec 2, 2009 at 11:17 AM, Bjorn Bringert <bringert at google.com> wrote: >> I agree that being able to capture and upload audio to a server would >> be useful for a lot of applications, and it could be used to do speech >> recognition. However, for a web app developer who just wants to >> develop an application that uses speech input and/or output, it >> doesn't seem very convenient, since it requires server-side >> infrastructure that is very costly to develop and run. A >> speech-specific API in the browser gives browser implementors the >> option to use on-device speech services provided by the OS, or >> server-side speech synthesis/recognition. > > Again, it would help a lot of you could provide use cases and > requirements. This helps both with designing an API, as well as > evaluating if the use cases are common enough that a dedicated API is > the best solution. > > / Jonas I'm mostly thinking about speech web apps for mobile devices. I think that's where speech makes most sense as an input and output method, because of the poor keyboards, small screens, and frequent hands/eyes busy situations (e.g. while driving). Accessibility is the other big reason for using speech. Some ideas for use cases: - Search by speaking a query - Speech-to-speech translation - Voice Dialing (could open a tel: URI to actually make the call) - Dialog systems (e.g. the canonical pizza ordering system) - Lightweight JavaScript browser extensions (e.g. Greasemonkey / Chrome extensions) for using speech with any web site, e.g, for accessibility. Requirements: - Web app developer side: - Allows both speech recognition and synthesis. - Easy to use API. Makes simple things easy and advanced things possible. - Doesn't require web app developer to develop / run his own speech recognition / synthesis servers. - (Natural) language-neutral API. - Allows developer-defined application specific grammars / language models. - Allows multilingual applications. - Allows easy localization of speech apps. - Implementor side: - Easy enough to implement that it can get wide adoption in browsers. - Allows implementor to use either client-side or server-side recognition and synthesis. -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Thursday, 3 December 2009 04:06:05 UTC