- From: Oliver Hunt <oliver@apple.com>
- Date: Thu, 3 Dec 2009 08:21:20 -0800
On Dec 3, 2009, at 4:06 AM, Bjorn Bringert wrote: > On Wed, Dec 2, 2009 at 10:20 PM, Jonas Sicking <jonas at sicking.cc> wrote: >> On Wed, Dec 2, 2009 at 11:17 AM, Bjorn Bringert <bringert at google.com> wrote: >>> I agree that being able to capture and upload audio to a server would >>> be useful for a lot of applications, and it could be used to do speech >>> recognition. However, for a web app developer who just wants to >>> develop an application that uses speech input and/or output, it >>> doesn't seem very convenient, since it requires server-side >>> infrastructure that is very costly to develop and run. A >>> speech-specific API in the browser gives browser implementors the >>> option to use on-device speech services provided by the OS, or >>> server-side speech synthesis/recognition. >> >> Again, it would help a lot of you could provide use cases and >> requirements. This helps both with designing an API, as well as >> evaluating if the use cases are common enough that a dedicated API is >> the best solution. >> >> / Jonas > > I'm mostly thinking about speech web apps for mobile devices. I think > that's where speech makes most sense as an input and output method, > because of the poor keyboards, small screens, and frequent hands/eyes > busy situations (e.g. while driving). Accessibility is the other big > reason for using speech. Accessibility is already handle through ARIA and the host platforms accessibility features. > > Some ideas for use cases: > > - Search by speaking a query > - Speech-to-speech translation > - Voice Dialing (could open a tel: URI to actually make the call) > - Dialog systems (e.g. the canonical pizza ordering system) > - Lightweight JavaScript browser extensions (e.g. Greasemonkey / > Chrome extensions) for using speech with any web site, e.g, for > accessibility. I am unsure why the site should be directly responsible for things like audio based accessibility. What do you believe a site should be doing itself manually vs. the accessibility services provided by the host OS? > > Requirements: > > - Web app developer side: > - Allows both speech recognition and synthesis. ARIA (in conjunction with the OS accessibility services) already provides the accessibility focused text to speech (unsure about recognition side) > > - Doesn't require web app developer to develop / run his own speech > recognition / synthesis servers. This would seem to be "use the OS services" > > - Implementor side: > - Easy enough to implement that it can get wide adoption in browsers. These services are not simple -- any implementation would seem to be a significant amount of work, especially if you want to a) actually be good at it and b) interact with the host OS's native accessibility features. > - Allows implementor to use either client-side or server-side > recognition and synthesis. I honestly have no idea what you mean by this. --Oliver -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20091203/245afb31/attachment.htm>
Received on Thursday, 3 December 2009 08:21:20 UTC