- From: Hans Wennborg <hwennborg@google.com>
- Date: Fri, 24 Aug 2012 17:11:07 +0100
- To: "Young, Milan" <Milan.Young@nuance.com>
- Cc: Glen Shires <gshires@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
Looks good to me too. On Fri, Aug 24, 2012 at 4:43 PM, Young, Milan <Milan.Young@nuance.com> wrote: > I’ll go with that. > > > > From: Glen Shires [mailto:gshires@google.com] > Sent: Friday, August 24, 2012 7:58 AM > > > To: Young, Milan > Cc: public-speech-api@w3.org > Subject: Re: SpeechRecognitionEvent resultIndex / resultHistory > > > > Here is the new wording I propose for "results" (formerly named > "resultHistory"). The only change from my last proposed wording is the > addition of the last sentence. > > > > "The array of all current recognition results for this session. > Specifically all final results that have been returned, followed by the > current best hypothesis for all interim results. It consists of zero or more > final results followed by zero or more interim results. On subsequent > SpeechRecognitionResultEvent events, interim results may be overwritten by a > newer interim result or by a final result or may be removed (when at the end > of the "results" array and the array length decreases). Final results cannot > be overwritten or removed. All entries for indexes less than resultIndex > must be identical to the array that was present when the last > SpeechRecognitionResultEvent was raised. All array entries (if any) for > indexes equal or greater than resultIndex that were present in the array > when the last SpeechRecognitionResultEvent was raised are removed and > overwritten with new results. The length of the "results" array may > increase or decrease, but cannot be less than resultIndex. Note that when > resultIndex == results.length, no new results are returned, this may occur > when the array length decreases to remove one or more interim results. > > > > /Glen Shires > > > > On Thu, Aug 23, 2012 at 9:55 PM, Young, Milan <Milan.Young@nuance.com> > wrote: > > Thanks for the clarification, this looks good. But I’m a still a bit wary > about the case where resultIndex == length. At a minimum, we should add > language warning that results[resultIndex] will not always return a valid > element. But even then, there’s a good chance that developers will miss > that subtlety and simply rely on testing to make sure their app works. The > problem with that approach is that 99.9% of the time their bad assumption > will hold true and they will probably miss the error only to find it later > in production. > > > > What do you think about changing the wording of resultIndex to accommodate > the exception of deleting the interim tail? Could we perhaps add a new > marker to signal a finalized state of the entire array? I’m not married to > either one of these ideas, just brainstorming. > > > > Thanks > > > > > > From: Glen Shires [mailto:gshires@google.com] > Sent: Thursday, August 23, 2012 5:42 PM > > > To: Young, Milan > Cc: public-speech-api@w3.org > Subject: Re: SpeechRecognitionEvent resultIndex / resultHistory > > > > Milan, > > Good points! I changed the word "replaced" to "overwritten" and made a few > other changes. Note that the case in which resultIndex equals the length of > the array is useful when the last interim entry needs to be removed. For > example, suppose resultHistory represents this state: > > > > final: "To be" final: " or not to be" interim: " the" > > > > And then the recognizer determines that the interim result was not a > continuation, but just superfluous noise, so it updates the state to: > > > > final: "To be" final: " or not to be" > > > > To delete this last interim, it would send a SpeechRecognitionResultEvent > with resultIndex = 2 and resultHistory.length = 2. While this case may not > ever occur with some recognizers, it's useful to support this case for any > recognizers that require it. Note also that the simple JavaScript loop to > process results, that I suggested earlier, does not change, as it processes > this case correctly as well: > > > > for (i = resultIndex; i < resultHistory.length; ++i) { > > // process resultHistory[i]; > > } > > > > > > Here is the slightly updated wording I propose for resultHistory: > > > > "The array of all current recognition results for this session. > Specifically all final results that have been returned, followed by the > current best hypothesis for all interim results. It consists of zero or more > final results followed by zero or more interim results. On subsequent > SpeechRecognitionResultEvent events, interim results may be overwritten by a > newer interim result or by a final result or may be removed (when at the end > of the resultHistory array and the array length decreases). Final results > cannot be overwritten or removed. All entries for indexes less than > resultIndex must be identical to the array that was present when the last > SpeechRecognitionResultEvent was raised. All array entries (if any) for > indexes equal or greater than resultIndex that were present in the array > when the last SpeechRecognitionResultEvent was raised are removed and > overwritten with new results. The length of the resultHistory array may > increase or decrease, but cannot be less than resultIndex. > > > > /Glen Shires > > > > On Thu, Aug 23, 2012 at 4:50 PM, Young, Milan <Milan.Young@nuance.com> > wrote: > > This is a step in the right direction, but I still think the wording for > resultHistory needs work. A couple concrete objections: > > * The opening sentence is misleading because resultHistory doesn’t > capture all results in this session, but rather the current best hypothesis > of results over the session. > > * You have “the length of the array cannot be less than the resultIndex”, > but doesn’t it always have to be greater? > > > > My last objection is fuzzy: I just found that paragraph hard to read. I > think the confusion centered on the use of the word “replaced”. I found it > odd because the event is delivering a “free standing” array, not a diff. I > understand that the underlying implementation may take a different view, but > we are describing an API here, not a cookbook for implementers. I’d be > happy to suggest an alternative, but being that you and Hans are editors I > figured I’d give you first shot. > > > > Thanks > > > > > > From: Glen Shires [mailto:gshires@google.com] > Sent: Thursday, August 23, 2012 2:06 AM > To: Young, Milan > Cc: public-speech-api@w3.org > Subject: Re: SpeechRecognitionEvent resultIndex / resultHistory > > > > Milan, > > Yes, I agree the wording needs to be clarified. I also agree that "the case > of correcting a previous interim while deleting the tail of the result list" > is a reasonably common operation, and that case can be implemented with the > following definitions. > > > > I propose the following wording for resultHistory: > > > > "The array of all of the recognition results that have so far been > returned as part of this session. It consists of zero or more final results > followed by zero or more interim results. On subsequent > SpeechRecognitionResultEvent events, interim results may be replaced by a > newer interim result or by a final result. Final results cannot be replaced. > All entries for indexes less than resultIndex must be identical to the array > that was present when the last SpeechRecognitionResultEvent was raised. All > array entries for indexes equal or greater than resultIndex replace any > prior entries that were present in the array (if any) when the last > SpeechRecognitionResultEvent was raised. The length of the resultHistory > array may increase or decrease, but cannot be less than resultIndex. > > > > I propose the following wording for resultIndex: > > > > "The resultIndex must be set to the lowest index in the resultHistory > array that has changed. When continuous was false, the resultIndex must > always be 0." > > > > I propose to eliminate the resultdeleted event because it results in > inconsistent states, and because the above definition of resultHistory / > resultIndex makes the resultdeleted event superfluous. > > > > I propose to eliminate SpeechRecognitionResultEvent.result because > SpeechRecognitionResultEvent may (and often does) return multiple results. > The JavaScript author can easily process all new results with code such as: > > > > for (i = resultIndex; i < resultHistory.length; ++i) { > > // process resultHistory[i]; > > } > > > > > /Glen Shires > > On Tue, Aug 21, 2012 at 10:31 PM, Young, Milan <Milan.Young@nuance.com> > wrote: > > I agree with the spirit of the change, but I’m unsure about the wording. > > > > The result deleted event says “The resultIndex of this event will be the > element that was deleted” and your text says “The resultIndex must be set to > the lowest index in the resultHistory array that has changed.” This > combination would seem to preclude the case of correcting a previous interim > while deleting the tail of the result list, which I would guess is a > reasonably common operation. > > > > > > From: Glen Shires [mailto:gshires@google.com] > Sent: Tuesday, August 21, 2012 8:09 AM > To: public-speech-api@w3.org > Subject: SpeechRecognitionEvent resultIndex / resultHistory > > > > As speech is processed, typically a portion of (but not all of) the interim > results become final. As portions become final, the interim hypotheses > typically also change. For example, the following sequence might occur. > (Each line below represents one point in time.) > > > > interim: "Tube" > > > > interim: "To be born" > > > > interim: "To be or not to be" > > > > final: "To be" interim: " or not to be there" > > > > final: "To be" final: " or not to be" interim: " that is" > > > > final: "To be" final: " or not to be" interim: " that is" interim " the > question" > > > > final: "To be" final: " or not to be" final: " that is the" interim: " > question what" > > > > final: "To be" final: " or not to be" final: " that is the" final: " > question." interim: " Weather today" > > > > final: "To be" final: " or not to be" final: " that is the" final: " > question." interim: " Whether tis nobler" > > > > final: "To be" final: " or not to be" final: " that is the" final: " > question." final: " Whether" interim: " tis nobler" > > > > final: "To be" final: " or not to be" final: " that is the" final: " > question." final: " Whether" final: " tis nobler" > > > > > > Our current spec doesn't support such simultaneous changes to both interim > and final results. Instead, each SpeechRecognitionEvent returns only a > single "final" or a single "interim" result. I propose a simple change to > enable SpeechRecognitionEvent to return multiple "final" and "interim" > events. I believe this has the following advantages: > > > > - Provides more accurate results (it avoids inconsistent states in which the > "final" has been returned but the "interim" has not yet been updated). > > > > - Provides more efficient processing (it reduces the number of events that > JavaScript needs to respond to and, more importantly, it avoids the UI > rendering of those inconsistent states). > > > > - It simplifies the JavaScript coding (by not having to detect or compensate > for inconsistent states). > > > > > > Therefore, I propose a slight re-definition of resultIndex: > > > > "The resultIndex must be set to the lowest index in the resultHistory > array that has changed. Entries at greater indexes in the resultHistory > array (if any) may also have changed." > > > > followed by the rest of the existing definition of resultIndex: > > > > "The resultIndex may refer to a previous occupied array index from a > previous SpeechRecognitionResultEvent. When this is the case this new result > overwrites the earlier result and is a more accurate result; however, when > this is the case the previous value must not have been a final result. When > continuous was false, the resultIndex must always be 0." > > > > > > And a slight re-definition of resultHistory: > > > > "The array of all of the recognition results that have been returned as > part of this session. All entries for indexes less than resultIndex must be > identical to the array that was present when the last > SpeechRecognitionResultEvent was raised. > > > > > > To illustrate, the fourth line in our example above would return the > SpeechRecognitionResultEvent with > > resultIndex = 0 > > resultHistory[0] = "To be", final = true, resultHistory[1] = " or not to > be there", final = false > > > > and the the fifth line in our example above would return the > SpeechRecognitionResultEvent with > > resultIndex = 1 > > resultHistory[0] = "To be", final = true, resultHistory[1] = " or not to > be", final = true, resultHistory[2] = " that is", final = false > > > > > > /Glen Shires > > > > > > > >
Received on Friday, 24 August 2012 16:12:00 UTC