W3C home > Mailing lists > Public > public-speech-api@w3.org > April 2012

RE: Flatter structure for SpeechRecognitionResult (was: "Speech API: first editor's draft posted")

From: Young, Milan <Milan.Young@nuance.com>
Date: Thu, 26 Apr 2012 20:00:58 +0000
To: Satish S <satish@google.com>
CC: "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A456BF5@SOM-EXCH04.nuance.com>
There are two issues, so let's split them up.

1) Flatter structure:  My original suggestion was to eliminate the 'result' object in the event.  For example, event.result.item.transcript becomes event.item.transcript (in your example you missed the 'item' token BTW).  But maybe a better idea would be to alias item[0] to the top level in the result.  So event.result.item.transcript would become event.result.transcript.  I'm not sure if IDL is capable of representing the idea of an alais (double inheritance?), but I think it would make sense to developers.  For reference, VoiceXML uses this same pattern.

2) I find it ugly that the event would contain both a success and error path when their usage is mutually exclusive.  In languages that I'm familiar with, such notifications would be thrown with different events.  I don't have much DOM experience, but I've seen something similar with onError()/onSuccess() callbacks and the pattern seems reasonable.  But being that you have more experience with DOM, I'm open to your suggestions.  I didn't quite follow what you were saying below, so perhaps you could rephrase.


-----Original Message-----
From: Satish S [mailto:satish@google.com] 
Sent: Tuesday, April 24, 2012 8:36 AM
To: Young, Milan
Subject: Flatter structure for SpeechRecognitionResult (was: "Speech API: first editor's draft posted")

(Splitting off to a new thread so we can follow discussions easily.
Please start a new threads for proposed additions/changes)

In the current structure to access the current result you have to do

I don't see making this flatter is improving much, because event.resultHistory is a list of SpeechRecognitionResult objects and it would be clearer to have the same object as event.result as well as it is now.

If the idea is to move the error code out of the event, that could possibly be moved to SpeechRecognition object itself. That is an established pattern with many DOM APIs such as XMLHttpRequest, WebSockets etc.


On Fri, Apr 13, 2012 at 10:05 PM, Young, Milan <Milan.Young@nuance.com> wrote:
> Thank you for the draft, this looks like an excellent start.  A few comments/suggestions on the following:
> SpeechRecognition
>  - In addition to the three parameters you have listed, I see the following as necessary:
>        integer maxNBest;
>        float confidenceThreshold;
>        integer completeTimeout;
>        integer incompleteTimeout;
>        integer maxSpeechTimeout;
>        attribute DOMString serviceURI;
> - We'll also need an interface for setting non-standard parameters.  This will be critical to avoid rat-holing into a complete list of parameters.
>        SpeechParameterList parameters;
>        void setCustomParameter(in DOMString name, in DOMString value);
>    interface SpeechParameter {
>        attribute DOMString name;
>        attribute DOMString value;
>    };
>    interface SpeechParameterList {
>        readonly attribute unsigned long length;
>        getter SpeechParameter item(in unsigned long index);
>    };
> - I prefer a flatter structure for SpeechRecogntion.  Part of doing that would involve splitting the error path out to its own event.  I suggest the following:
>    // A full response, which could be interim or final, part of a 
> continuous response or not
>    interface SpeechRecognitionResult : RecognitionEvent {
>        readonly attribute unsigned long length;
>        getter SpeechRecognitionAlternative item(in unsigned long 
> index);
>        readonly attribute boolean final;
>        readonly attribute short resultIndex;
>        readonly attribute SpeechRecognitionResultList resultHistory;
>    };
>    interface SpeechRecognitionError : RecognitionEvent {
>      // As before
>    };
>  - At a minimum, we'll need the same serviceURI parameter and generic parameter interface as in SpeechRecognition.
>  - I'd also like to hear some discussion on the importance of "marking" the stream.  I personally feel this is common enough that I should be part of a v1.
> Thanks
> -----Original Message-----
> From: Hans Wennborg [mailto:hwennborg@google.com]
> Sent: Thursday, April 12, 2012 7:36 AM
> To: public-speech-api@w3.org
> Cc: Satish S; Glen Shires
> Subject: Speech API: first editor's draft posted
> In December, Google proposed [1] to public-webapps a Speech JavaScript API that subset supports the majority of the use-cases in the Speech Incubator Group's Final Report. This proposal provides a programmatic API that enables web-pages to synthesize speech output and to use speech recognition as an input for forms, continuous dictation and control.
> We have now posted in the Speech-API Community Group's repository, a slightly updated proposal [2], the differences include:
>  - Document is now self-contained, rather than having multiple references to the XG Final Report.
>  - Renamed SpeechReco interface to SpeechRecognition
>  - Renamed interfaces and attributes beginning SpeechInput* to
> SpeechRecognition*
>  - Moved EventTarget to constructor of SpeechRecognition
>  - Clarified that grammars and lang are attributes of 
> SpeechRecognition
>  - Clarified that if index is greater than or equal to length, returns 
> null
> We welcome discussion and feedback on this editor's draft. Please send your comments to the public-speech-api@w3.org mailing list.
> Glen Shires
> Hans Wennborg
> [1] 
> http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.htm
> l [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
Received on Thursday, 26 April 2012 20:01:45 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:27:22 UTC