RE: SpeechRecognitionEvent resultIndex / resultHistory

Thanks for the clarification - I was confused by the name resultHistory as well.  I expected it to refer to some sort of n-best over time, but as I looked at the definition, it became clear that it was something like "current best interpretation".    Is 'resultHistory' the right name in this case? 

 

-          Jim

 

From: Young, Milan [mailto:Milan.Young@nuance.com] 
Sent: Thursday, August 23, 2012 7:50 PM
To: Glen Shires
Cc: public-speech-api@w3.org
Subject: RE: SpeechRecognitionEvent resultIndex / resultHistory

 

This is a step in the right direction, but I still think the wording for resultHistory needs work.  A couple concrete objections:

  *  The opening sentence is misleading because resultHistory doesn't capture all results in this session, but rather the current best hypothesis of results over the session.

  * You have "the length of the array cannot be less than the resultIndex", but doesn't it always have to be greater?

 

My last objection is fuzzy: I just found that paragraph hard to read.  I think the confusion centered on the use of the word "replaced".  I found it odd because the event is delivering a "free standing" array, not a diff.  I understand that the underlying implementation may take a different view, but we are describing an API here, not a cookbook for implementers.  I'd be happy to suggest an alternative, but being that you and Hans are editors I figured I'd give you first shot.

 

Thanks

 

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Thursday, August 23, 2012 2:06 AM
To: Young, Milan
Cc: public-speech-api@w3.org
Subject: Re: SpeechRecognitionEvent resultIndex / resultHistory

 

Milan,

Yes, I agree the wording needs to be clarified.  I also agree that "the case of correcting a previous interim while deleting the tail of the result list" is a reasonably common operation, and that case can be implemented with the following definitions.

 

I propose the following wording for resultHistory:

 

    "The array of all of the recognition results that have so far been returned as part of this session. It consists of zero or more final results followed by zero or more interim results. On subsequent SpeechRecognitionResultEvent events, interim results may be replaced by a newer interim result or by a final result. Final results cannot be replaced. All entries for indexes less than resultIndex must be identical to the array that was present when the last SpeechRecognitionResultEvent was raised.  All array entries for indexes equal or greater than resultIndex replace any prior entries that were present in the array (if any) when the last SpeechRecognitionResultEvent was raised.  The length of the resultHistory array may increase or decrease, but cannot be less than resultIndex.

 

I propose the following wording for resultIndex:

 

    "The resultIndex must be set to the lowest index in the resultHistory array that has changed. When continuous was false, the resultIndex must always be 0."

 

I propose to eliminate the resultdeleted event because it results in inconsistent states, and because the above definition of resultHistory / resultIndex makes the resultdeleted event superfluous.

 

I propose to eliminate SpeechRecognitionResultEvent.result because SpeechRecognitionResultEvent may (and often does) return multiple results. The JavaScript author can easily process all new results with code such as:

 

for (i = resultIndex;  i < resultHistory.length; ++i) {

  // process resultHistory[i];

}

 


/Glen Shires

On Tue, Aug 21, 2012 at 10:31 PM, Young, Milan <Milan.Young@nuance.com> wrote:

I agree with the spirit of the change, but I'm unsure about the wording.

 

The result deleted event says "The resultIndex of this event will be the element that was deleted" and your text says "The resultIndex must be set to the lowest index in the resultHistory array that has changed."  This combination would seem to preclude the case of correcting a previous interim while deleting the tail of the result list, which I would guess is a reasonably common operation.

 

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Tuesday, August 21, 2012 8:09 AM
To: public-speech-api@w3.org
Subject: SpeechRecognitionEvent resultIndex / resultHistory

 

As speech is processed, typically a portion of (but not all of) the interim results become final.  As portions become final, the interim hypotheses typically also change.  For example, the following sequence might occur. (Each line below represents one point in time.)

 

interim: "Tube"

 

interim: "To be born"

 

interim: "To be or not to be"

 

final: "To be"  interim: " or not to be there"

 

final: "To be"  final: " or not to be"  interim: " that is"

 

final: "To be"  final: " or not to be"  interim: " that is"  interim " the question"

 

final: "To be"  final: " or not to be"  final: " that is the"  interim: " question what"

 

final: "To be"  final: " or not to be"  final: " that is the"  final: " question."  interim: " Weather today"

 

final: "To be"  final: " or not to be"  final: " that is the"  final: " question."  interim: " Whether tis nobler"

 

final: "To be"  final: " or not to be"  final: " that is the"  final: " question."  final: " Whether"  interim: " tis nobler"

 

final: "To be"  final: " or not to be"  final: " that is the"  final: " question."  final: " Whether"  final: " tis nobler"

 

 

Our current spec doesn't support such simultaneous changes to both interim and final results. Instead, each SpeechRecognitionEvent returns only a single "final" or a single "interim" result.  I propose a simple change to enable SpeechRecognitionEvent to return multiple "final" and "interim" events. I believe this has the following advantages:

 

- Provides more accurate results (it avoids inconsistent states in which the "final" has been returned but the "interim" has not yet been updated).

 

- Provides more efficient processing (it reduces the number of events that JavaScript needs to respond to and, more importantly, it avoids the UI rendering of those inconsistent states).

 

- It simplifies the JavaScript coding (by not having to detect or compensate for inconsistent states).

 

 

Therefore, I propose a slight re-definition of resultIndex:

 

    "The resultIndex must be set to the lowest index in the resultHistory array that has changed.  Entries at greater indexes in the resultHistory array (if any) may also have changed."

 

followed by the rest of the existing definition of resultIndex:

 

    "The resultIndex may refer to a previous occupied array index from a previous SpeechRecognitionResultEvent. When this is the case this new result overwrites the earlier result and is a more accurate result; however, when this is the case the previous value must not have been a final result. When continuous was false, the resultIndex must always be 0."

 

 

And a slight re-definition of resultHistory:

 

    "The array of all of the recognition results that have been returned as part of this session. All entries for indexes less than resultIndex must be identical to the array that was present when the last SpeechRecognitionResultEvent was raised.

 

 

To illustrate, the fourth line in our example above would return the SpeechRecognitionResultEvent with

  resultIndex = 0

  resultHistory[0] = "To be", final = true,   resultHistory[1] = " or not to be there", final = false 

 

and the the fifth line in our example above would return the SpeechRecognitionResultEvent with

  resultIndex = 1

  resultHistory[0] = "To be", final = true,   resultHistory[1] = " or not to be", final = true,   resultHistory[2] = " that is", final = false

 

 

/Glen Shires

 

 

Received on Friday, 24 August 2012 01:36:51 UTC