RE: [HTML Speech] speech resource specification requirement

The spec should allow the web developer to choose between quality and speed for the recognition when crafting the html.
What I'm proposing is an attribute that can indicate the developer's expectation from the speech recognition engine.
For a relatively small text field (like a yes or no) the developer might say "low" and the engine should be able to choose a different logic than when processing a full sentence.
Also, a language attribute at the field level should be allowed to override the page language as the voice input might be expected to come in a different language.
As a general thought, If I look at this spec how it should shape up it should address real use case scenarios from a user's standpoint and also provide enough input to the recognition engine to make better decisions in terms of the recognition optimization and put the developer in control of the choices.
It should stop at just providing inputs to the engine and make no assumptions in how the engine will fulfill the request.

Regards,

Dan Druta
AT&T - Service Standards

-----Original Message-----
From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Satish Sampath
Sent: Thursday, September 09, 2010 10:53 AM
To: JOHNSTON, MICHAEL J (ATTLABS)
Cc: public-xg-htmlspeech@w3.org
Subject: Re: [HTML Speech] speech resource specification requirement

> The HTML+Speech standard must allow specification of the speech resource
> (e.g. speech recognizer) to be used for processing of the audio
> collected from the user. For example, this could be specified
> as URI valued attribute on the element supporting speech recognition.
> When audio is captured from the user it will then be streamed over http
> to the specified URI.

Specifying the speech recognizer would also require standardising the
protocol between the UA and the recognizer. I like how many of the
existing APIs such as Geolocation
(http://dev.w3.org/geo/api/spec-source.html) are agnostic to which
resource/server is used and let the UA make the choice. That keeps the
spec simple and focused on the web developer.

> Web app might want to process the microphone input data
> somehow before pushing it to recognizer.
> https://wiki.mozilla.org/Audio_Data_API
.....
> If the speech input can be captured as data by the web page, it
> can stream the data using XMLHttpRequest or WebSockets to server.

These seem more applicable to the <device> specification which allows
capturing arbitrary audio and process/stream it. It also brings up
interesting security/privacy concerns if the recorded audio is given
to the web app, which is again being addressed in the <device>
specification. I think we should look at speech related use cases and
requirements here than general purpose audio manipulation.

Cheers
Satish



On Wed, Sep 8, 2010 at 8:50 PM, JOHNSTON, MICHAEL J (MICHAEL J)
<johnston@research.att.com> wrote:
>
> Here is one of the specific requirements we have for adding speech to HTML:
>
> Requirement:
>
> The HTML+Speech standard must allow specification of the speech resource
> (e.g. speech recognizer) to be used for processing of the audio
> collected from the user. For example, this could be specified
> as URI valued attribute on the element supporting speech recognition.
> When audio is captured from the user it will then be streamed over http
> to the specified URI.
>
> best
> Michael
>
>
>
>>
>> =======================================
>> REQUIREMENTS, USE CASES, and PROPOSALS
>> =======================================
>> I think the best way to begin is to ask right up front for the items we are interested in:  requirements, use cases, and proposals for changes to HTML.
>>
>> If you have requirements, use cases, or proposals for changes to HTML, please send them in to this list.  When the trickle slows we'll look at what we have and decide on next steps.  For expediency, please plan to send in any such materials by Monday, September 13.
>
>
>
>

Received on Thursday, 9 September 2010 12:13:36 UTC