- From: Kazuyuki Ashimura <ashimura@w3.org>
- Date: Thu, 25 Mar 2010 03:23:22 +0900
- To: public-html@w3.org
Dear HTML5 Working Group, The purpose of this message is to initiate discussion between the HTML Working Group and the W3C Working Groups involved with voice -- the Voice Browser and Multimodal Interaction Working Groups. Paul Cotton has encouraged us to start this discussion on the HTML mailing list in order to collect use cases for how voice can be used in HTML applications and to collect your ideas and requirements about the best ways for HTML authors to access voice capabilities. By "voice" we mean capabilities such as speech recognition, text to speech, speaker verification (confirming someone's identity through their voice), audio capture and audio playback, and the ability to coordinate all these capabilities by means of a dialog. Some possible voice use cases that occur to us include: 1. form-filling by voice; that is, speaking form values rather than typing or selecting them with a mouse 2. initiating a search (for example, a web search, site search or page search) by speaking the search terms rather than typing them 3. using text to speech to read portions of a screen (for example if the user's eyes are busy or if the user is illiterate or dyslexic). 4. using voice for general text input on mobile devices with hard to use keyboards 5. using speaker verification to confirm the user's identity, for example as a supplement to a user id and password 6. combinations of the above, for example, selecting part of the screen and saying "read that" We are very interested in hearing your reactions to these use cases and any other use cases that you might be thinking about. An important consideration for voice applications is where the actual speech technology comes from. Some platforms, like Windows 7, have speech recognition built into the OS, and this includes even some small mobile devices. Another option is for speech to be built into the browser. In the past, some browsers (for example Opera and IE) have included speech recognition and text to speech in the browser. Speech technologies are also available in the cloud, using either standard protocols like MRCP, or other services such as the AT&T Speech Mashup or MIT's WAMI. In fact, in typical voice-only applications the browser itself runs in the network because the application must be accessible from very limited input devices such as traditional land-line phones. There are pros and cons to all these approaches that we would be happy to discuss if there is interest. We are hoping to get your opinions about which of these are the most critical to support. Finally, regardless of where the speech processing is actually done, we are also very interested in discussing requirements for different ways authors could access speech functionality from HTML. Two possibilities, although there may be others, are JavaScript libraries that link to speech services or declarative markup. Kazuyuki Ashimura W3C Multimodal Interaction & Voice Browser Activity Lead -- Kazuyuki Ashimura / W3C Multimodal & Voice Activity Lead mailto: ashimura@w3.org voice: +81.466.49.1170 / fax: +81.466.49.1171
Received on Wednesday, 24 March 2010 18:23:54 UTC