RE: Revised SpeechRecognitionResult from Young, Milan on 2012-05-22 (public-speech-api@w3.org from May 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Tue, 22 May 2012 18:22:05 +0000
To: Hans Wennborg <hwennborg@google.com>
CC: "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A45C3B0@SOM-EXCH04.nuance.com>

Hello Hans,  

It's not uncommon for recognition engines to return a guess at what the user said/meant even for a nomatch result.  So we shouldn't rule this out in the API.  But I agree that we should provision for an empty result.

As far as communicating this with a null vs event, I have a slight preference for an event.  Two reasons:
  * Easier for implementers.  This is a true alias.
  * We may want to allow empty interpretations or utterances, and thus a null would be ambiguous.

But frankly I haven't thought this through, so if someone has a dissenting opinion, I'm happy to relent.

Thanks

-----Original Message-----
From: Hans Wennborg [mailto:hwennborg@google.com] 
Sent: Tuesday, May 22, 2012 6:01 AM
To: Young, Milan
Cc: public-speech-api@w3.org
Subject: Re: Revised SpeechRecognitionResult

On Fri, May 18, 2012 at 8:13 PM, Young, Milan <Milan.Young@nuance.com> wrote:
> I suggest we add the following fields to the IDL for 
> SpeechRecognitionResult in section 5.1:
>
>         readonly attribute DOMString transcript;
>
>         readonly attribute float confidence;
>
>         readonly attribute any interpretation;
>
>
>
> Section 5.1.6 would also need the following additions:
>
>     transcript - Shortcut to the transcript property on the first 
> SpeechRecognitionAlternative (i.e. same value as item[0].transcript).
>
>     confidence - Shortcut to the confidence property on the first 
> SpeechRecognitionAlternative (i.e. same value as item[0]. confidence).
>
>     interpretation - Shortcut to the interpretation property on the 
> first SpeechRecognitionAlternative (i.e. same value as item[0]. interpretation).

This sounds pretty reasonable.

> Such a scheme carries the requirement that every recognition result 
> has at least one alternative (otherwise index out of bounds).  But 
> given that we already have a way to communicate error results, I think 
> this is OK.  In other words, I can't think of a case where a 
> successful recognition would not contain at least one alternative.

The 'nomatch' and 'resultdeleted' events also use the SpeechRecognitionResult interface, and at least for 'nomatch', there won't be any alternatives. I guess one solution would be to have the 'transcript', 'confidence', and 'interpretation' fields return null (or throw?) in that case?

Thanks,
Hans

Received on Tuesday, 22 May 2012 18:22:55 UTC