RE: Revised SpeechRecognitionResult from Young, Milan on 2012-05-23 (public-speech-api@w3.org from May 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Wed, 23 May 2012 20:38:42 +0000
To: Satish S <satish@google.com>, Hans Wennborg <hwennborg@google.com>
CC: "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A45C8C9@SOM-EXCH04.nuance.com>

We can point to standards on both sides of the fence.  Perhaps it is a better use of time to consider our particular use case.

I'd argue that 90% of developers will not even think about the second item on the nbest list.  So why complicate their mental model let alone syntax with SpeechRecogntionAlternatives?

For the 10% that do understand an nbest list and its proper use, most will be familiar with VoiceXML which shares the same model.

From: Satish S [mailto:satish@google.com]
Sent: Wednesday, May 23, 2012 7:02 AM
To: Hans Wennborg
Cc: Young, Milan; public-speech-api@w3.org
Subject: Re: Revised SpeechRecognitionResult

I'd prefer not having such shortcuts in the API. As a parallel, see the W3C File API's FileList interface
http://www.w3.org/TR/FileAPI/#dfn-filelist

To read the size of a file you'd have to do:
    var size = document.forms['uploadData']['fileChooser'].files[0].size;
but that hasn't resulted in a shorter version like
    var size = document.forms['uploadData']['fileChooser'].size;

If developers are accessing "item[0].utterance" more than once in their code they'd usually do
  var item = event.result.item[0];
  .. = item.utterance

Cheers
Satish

On Wed, May 23, 2012 at 12:11 PM, Hans Wennborg <hwennborg@google.com<mailto:hwennborg@google.com>> wrote:
>
> On Tue, May 22, 2012 at 7:22 PM, Young, Milan <Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>> wrote:
> > Hello Hans,
> >
> > It's not uncommon for recognition engines to return a guess at what the user said/meant even for a nomatch result.  So we shouldn't rule this out in the API.
>
> Right. The spec currently says "nomatch event: [...] The result field
> in the event may contain speech recognition results that are below the
> confidence threshold or may be null."
>
> So that covers both cases.
>
> > As far as communicating this with a null vs event, I have a slight preference for an event.  Two reasons:
>
> I'm not sure what you mean by "communication this with a null vs
> event". I was talking about returning null or throwing an exception.
> Is that what you mean?
>
> >  * Easier for implementers.  This is a true alias.
>
> I'm not sure what you mean by true alias.
>
> >  * We may want to allow empty interpretations or utterances, and thus a null would be ambiguous.
>
> Ah, yes. So throwing an exception seems like the better option.
>
> Thanks,
> Hans
>

Received on Wednesday, 23 May 2012 20:39:18 UTC