Re: [HTML Speech] speech resource specification requirement from JOHNSTON, MICHAEL J (MICHAEL J) on 2010-09-09 (public-xg-htmlspeech@w3.org from September 2010)

From: JOHNSTON, MICHAEL J (MICHAEL J) <johnston@research.att.com>
Date: Thu, 9 Sep 2010 13:16:51 -0400
To: Satish Sampath <satish@google.com>
CC: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <9BDE9C6B-4C0F-451C-AF23-10D9665A6A03@research.att.com>
On Sep 9, 2010, at 5:53 AM, Satish Sampath wrote:

>> The HTML+Speech standard must allow specification of the speech resource
>> (e.g. speech recognizer) to be used for processing of the audio
>> collected from the user. For example, this could be specified
>> as URI valued attribute on the element supporting speech recognition.
>> When audio is captured from the user it will then be streamed over http
>> to the specified URI.
> 
> Specifying the speech recognizer would also require standardising the
> protocol between the UA and the recognizer. I like how many of the
> existing APIs such as Geolocation
> (http://dev.w3.org/geo/api/spec-source.html) are agnostic to which
> resource/server is used and let the UA make the choice. That keeps the
> spec simple and focused on the web developer.
> 

True, this does open up the issue of how to standardize the protocol, but
I don't think there really is much choice ...

Consider the following use case, a company, let's call them ACME Corp.
wants to put out a speech enabled web page that allows users to search
for their various products and services using voice. As part of their development
effort, they build a language model that supports this task.  With HTML+Speech
allowing specification of a speech resource on the network, they can serve the
same speech enabled page to all desktop and mobile browsers 
supporting the standard. 

Even in the case where we just have more general purpose models e.g.
(dictation, search) not being able to specify the recognition resource is a 
problem.  Let's say ACME corp has a budget shortfall for the project and
rather than building a custom model they decide to just go with a general
purpose dictation model.  Without the ability to point to specific general purpose
model in the network to be used they have to rely on whatever is supported by the
specific browser. We now have a situation where users will have a different
experience using speech input depending the browser, differing accuracy, 
possible differences in tokenization and normalization.  One of the central
goals of the web (and W3C) is strive for consistency of experience 
across different browsers.  A developer creating a (multimodal) interface
combining speech input with graphical output needs to have the
ability to provide a consistent user experience not just for graphical
elements but also for voice. 



>> Web app might want to process the microphone input data
>> somehow before pushing it to recognizer.
>> https://wiki.mozilla.org/Audio_Data_API
> ....
>> If the speech input can be captured as data by the web page, it
>> can stream the data using XMLHttpRequest or WebSockets to server.
> 
> These seem more applicable to the <device> specification which allows
> capturing arbitrary audio and process/stream it. It also brings up
> interesting security/privacy concerns if the recorded audio is given
> to the web app, which is again being addressed in the <device>
> specification. I think we should look at speech related use cases and
> requirements here than general purpose audio manipulation.
> 
> Cheers
> Satish
> 
> 
> 
> On Wed, Sep 8, 2010 at 8:50 PM, JOHNSTON, MICHAEL J (MICHAEL J)
> <johnston@research.att.com> wrote:
>> 
>> Here is one of the specific requirements we have for adding speech to HTML:
>> 
>> Requirement:
>> 
>> The HTML+Speech standard must allow specification of the speech resource
>> (e.g. speech recognizer) to be used for processing of the audio
>> collected from the user. For example, this could be specified
>> as URI valued attribute on the element supporting speech recognition.
>> When audio is captured from the user it will then be streamed over http
>> to the specified URI.
>> 
>> best
>> Michael
>> 
>> 
>> 
>>> 
>>> =======================================
>>> REQUIREMENTS, USE CASES, and PROPOSALS
>>> =======================================
>>> I think the best way to begin is to ask right up front for the items we are interested in:  requirements, use cases, and proposals for changes to HTML.
>>> 
>>> If you have requirements, use cases, or proposals for changes to HTML, please send them in to this list.  When the trickle slows we'll look at what we have and decide on next steps.  For expediency, please plan to send in any such materials by Monday, September 13.
>> 
>> 
>> 
>>
Received on Thursday, 9 September 2010 17:15:14 UTC