- From: Glen Shires <gshires@google.com>
- Date: Fri, 24 Aug 2012 10:03:45 -0700
- To: Hans Wennborg <hwennborg@google.com>
- Cc: "Young, Milan" <Milan.Young@nuance.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
- Message-ID: <CAEE5bciYtAmsmGwD7r7-Ao0UZVnpfhzsXkVSRg2zfbMzXSj_rw@mail.gmail.com>
Wonderful. For completeness, here's all the changes in one place (including the renaming of "resultHistory" to "results"). I'll update the spec with these next week unless there are disagreements... interface SpeechRecognitionEvent : Event { readonly attribute short resultIndex; readonly attribute SpeechRecognitionResultList results; }; 5.1.8 Speech Recognition Event The Speech Recognition Event is the event that is raised each time there are any changes to interim or final results. resultIndex The resultIndex must be set to the lowest index in the "results" array that has changed. When continuous was false, the resultIndex must always be 0. results The array of all current recognition results for this session. Specifically all final results that have been returned, followed by the current best hypothesis for all interim results. It consists of zero or more final results followed by zero or more interim results. On subsequent SpeechRecognitionResultEvent events, interim results may be overwritten by a newer interim result or by a final result or may be removed (when at the end of the "results" array and the array length decreases). Final results cannot be overwritten or removed. All entries for indexes less than resultIndex must be identical to the array that was present when the last SpeechRecognitionResultEvent was raised. All array entries (if any) for indexes equal or greater than resultIndex that were present in the array when the last SpeechRecognitionResultEvent was raised are removed and overwritten with new results. The length of the "results" array may increase or decrease, but cannot be less than resultIndex. Note that when resultIndex == results.length, no new results are returned, this may occur when the array length decreases to remove one or more interim results. Eliminate the "result" element (because SpeechRecognitionResultEvent may, and often does, return multiple results.) Eliminate the resultdeleted event (because it results in inconsistent states, and because the above definition of results / resultIndex makes the resultdeleted event superfluous.) /Glen Shires On Fri, Aug 24, 2012 at 9:11 AM, Hans Wennborg <hwennborg@google.com> wrote: > Looks good to me too. > > On Fri, Aug 24, 2012 at 4:43 PM, Young, Milan <Milan.Young@nuance.com> > wrote: > > I’ll go with that. > > > > > > > > From: Glen Shires [mailto:gshires@google.com] > > Sent: Friday, August 24, 2012 7:58 AM > > > > > > To: Young, Milan > > Cc: public-speech-api@w3.org > > Subject: Re: SpeechRecognitionEvent resultIndex / resultHistory > > > > > > > > Here is the new wording I propose for "results" (formerly named > > "resultHistory"). The only change from my last proposed wording is the > > addition of the last sentence. > > > > > > > > "The array of all current recognition results for this session. > > Specifically all final results that have been returned, followed by the > > current best hypothesis for all interim results. It consists of zero or > more > > final results followed by zero or more interim results. On subsequent > > SpeechRecognitionResultEvent events, interim results may be overwritten > by a > > newer interim result or by a final result or may be removed (when at the > end > > of the "results" array and the array length decreases). Final results > cannot > > be overwritten or removed. All entries for indexes less than resultIndex > > must be identical to the array that was present when the last > > SpeechRecognitionResultEvent was raised. All array entries (if any) for > > indexes equal or greater than resultIndex that were present in the array > > when the last SpeechRecognitionResultEvent was raised are removed and > > overwritten with new results. The length of the "results" array may > > increase or decrease, but cannot be less than resultIndex. Note that > when > > resultIndex == results.length, no new results are returned, this may > occur > > when the array length decreases to remove one or more interim results. > > > > > > > > /Glen Shires > > > > > > > > On Thu, Aug 23, 2012 at 9:55 PM, Young, Milan <Milan.Young@nuance.com> > > wrote: > > > > Thanks for the clarification, this looks good. But I’m a still a bit > wary > > about the case where resultIndex == length. At a minimum, we should add > > language warning that results[resultIndex] will not always return a valid > > element. But even then, there’s a good chance that developers will miss > > that subtlety and simply rely on testing to make sure their app works. > The > > problem with that approach is that 99.9% of the time their bad assumption > > will hold true and they will probably miss the error only to find it > later > > in production. > > > > > > > > What do you think about changing the wording of resultIndex to > accommodate > > the exception of deleting the interim tail? Could we perhaps add a new > > marker to signal a finalized state of the entire array? I’m not married > to > > either one of these ideas, just brainstorming. > > > > > > > > Thanks > > > > > > > > > > > > From: Glen Shires [mailto:gshires@google.com] > > Sent: Thursday, August 23, 2012 5:42 PM > > > > > > To: Young, Milan > > Cc: public-speech-api@w3.org > > Subject: Re: SpeechRecognitionEvent resultIndex / resultHistory > > > > > > > > Milan, > > > > Good points! I changed the word "replaced" to "overwritten" and made a > few > > other changes. Note that the case in which resultIndex equals the length > of > > the array is useful when the last interim entry needs to be removed. For > > example, suppose resultHistory represents this state: > > > > > > > > final: "To be" final: " or not to be" interim: " the" > > > > > > > > And then the recognizer determines that the interim result was not a > > continuation, but just superfluous noise, so it updates the state to: > > > > > > > > final: "To be" final: " or not to be" > > > > > > > > To delete this last interim, it would send a SpeechRecognitionResultEvent > > with resultIndex = 2 and resultHistory.length = 2. While this case may > not > > ever occur with some recognizers, it's useful to support this case for > any > > recognizers that require it. Note also that the simple JavaScript loop > to > > process results, that I suggested earlier, does not change, as it > processes > > this case correctly as well: > > > > > > > > for (i = resultIndex; i < resultHistory.length; ++i) { > > > > // process resultHistory[i]; > > > > } > > > > > > > > > > > > Here is the slightly updated wording I propose for resultHistory: > > > > > > > > "The array of all current recognition results for this session. > > Specifically all final results that have been returned, followed by the > > current best hypothesis for all interim results. It consists of zero or > more > > final results followed by zero or more interim results. On subsequent > > SpeechRecognitionResultEvent events, interim results may be overwritten > by a > > newer interim result or by a final result or may be removed (when at the > end > > of the resultHistory array and the array length decreases). Final results > > cannot be overwritten or removed. All entries for indexes less than > > resultIndex must be identical to the array that was present when the last > > SpeechRecognitionResultEvent was raised. All array entries (if any) for > > indexes equal or greater than resultIndex that were present in the array > > when the last SpeechRecognitionResultEvent was raised are removed and > > overwritten with new results. The length of the resultHistory array may > > increase or decrease, but cannot be less than resultIndex. > > > > > > > > /Glen Shires > > > > > > > > On Thu, Aug 23, 2012 at 4:50 PM, Young, Milan <Milan.Young@nuance.com> > > wrote: > > > > This is a step in the right direction, but I still think the wording for > > resultHistory needs work. A couple concrete objections: > > > > * The opening sentence is misleading because resultHistory doesn’t > > capture all results in this session, but rather the current best > hypothesis > > of results over the session. > > > > * You have “the length of the array cannot be less than the > resultIndex”, > > but doesn’t it always have to be greater? > > > > > > > > My last objection is fuzzy: I just found that paragraph hard to read. I > > think the confusion centered on the use of the word “replaced”. I found > it > > odd because the event is delivering a “free standing” array, not a diff. > I > > understand that the underlying implementation may take a different view, > but > > we are describing an API here, not a cookbook for implementers. I’d be > > happy to suggest an alternative, but being that you and Hans are editors > I > > figured I’d give you first shot. > > > > > > > > Thanks > > > > > > > > > > > > From: Glen Shires [mailto:gshires@google.com] > > Sent: Thursday, August 23, 2012 2:06 AM > > To: Young, Milan > > Cc: public-speech-api@w3.org > > Subject: Re: SpeechRecognitionEvent resultIndex / resultHistory > > > > > > > > Milan, > > > > Yes, I agree the wording needs to be clarified. I also agree that "the > case > > of correcting a previous interim while deleting the tail of the result > list" > > is a reasonably common operation, and that case can be implemented with > the > > following definitions. > > > > > > > > I propose the following wording for resultHistory: > > > > > > > > "The array of all of the recognition results that have so far been > > returned as part of this session. It consists of zero or more final > results > > followed by zero or more interim results. On subsequent > > SpeechRecognitionResultEvent events, interim results may be replaced by a > > newer interim result or by a final result. Final results cannot be > replaced. > > All entries for indexes less than resultIndex must be identical to the > array > > that was present when the last SpeechRecognitionResultEvent was raised. > All > > array entries for indexes equal or greater than resultIndex replace any > > prior entries that were present in the array (if any) when the last > > SpeechRecognitionResultEvent was raised. The length of the resultHistory > > array may increase or decrease, but cannot be less than resultIndex. > > > > > > > > I propose the following wording for resultIndex: > > > > > > > > "The resultIndex must be set to the lowest index in the resultHistory > > array that has changed. When continuous was false, the resultIndex must > > always be 0." > > > > > > > > I propose to eliminate the resultdeleted event because it results in > > inconsistent states, and because the above definition of resultHistory / > > resultIndex makes the resultdeleted event superfluous. > > > > > > > > I propose to eliminate SpeechRecognitionResultEvent.result because > > SpeechRecognitionResultEvent may (and often does) return multiple > results. > > The JavaScript author can easily process all new results with code such > as: > > > > > > > > for (i = resultIndex; i < resultHistory.length; ++i) { > > > > // process resultHistory[i]; > > > > } > > > > > > > > > > /Glen Shires > > > > On Tue, Aug 21, 2012 at 10:31 PM, Young, Milan <Milan.Young@nuance.com> > > wrote: > > > > I agree with the spirit of the change, but I’m unsure about the wording. > > > > > > > > The result deleted event says “The resultIndex of this event will be the > > element that was deleted” and your text says “The resultIndex must be > set to > > the lowest index in the resultHistory array that has changed.” This > > combination would seem to preclude the case of correcting a previous > interim > > while deleting the tail of the result list, which I would guess is a > > reasonably common operation. > > > > > > > > > > > > From: Glen Shires [mailto:gshires@google.com] > > Sent: Tuesday, August 21, 2012 8:09 AM > > To: public-speech-api@w3.org > > Subject: SpeechRecognitionEvent resultIndex / resultHistory > > > > > > > > As speech is processed, typically a portion of (but not all of) the > interim > > results become final. As portions become final, the interim hypotheses > > typically also change. For example, the following sequence might occur. > > (Each line below represents one point in time.) > > > > > > > > interim: "Tube" > > > > > > > > interim: "To be born" > > > > > > > > interim: "To be or not to be" > > > > > > > > final: "To be" interim: " or not to be there" > > > > > > > > final: "To be" final: " or not to be" interim: " that is" > > > > > > > > final: "To be" final: " or not to be" interim: " that is" interim " > the > > question" > > > > > > > > final: "To be" final: " or not to be" final: " that is the" interim: " > > question what" > > > > > > > > final: "To be" final: " or not to be" final: " that is the" final: " > > question." interim: " Weather today" > > > > > > > > final: "To be" final: " or not to be" final: " that is the" final: " > > question." interim: " Whether tis nobler" > > > > > > > > final: "To be" final: " or not to be" final: " that is the" final: " > > question." final: " Whether" interim: " tis nobler" > > > > > > > > final: "To be" final: " or not to be" final: " that is the" final: " > > question." final: " Whether" final: " tis nobler" > > > > > > > > > > > > Our current spec doesn't support such simultaneous changes to both > interim > > and final results. Instead, each SpeechRecognitionEvent returns only a > > single "final" or a single "interim" result. I propose a simple change > to > > enable SpeechRecognitionEvent to return multiple "final" and "interim" > > events. I believe this has the following advantages: > > > > > > > > - Provides more accurate results (it avoids inconsistent states in which > the > > "final" has been returned but the "interim" has not yet been updated). > > > > > > > > - Provides more efficient processing (it reduces the number of events > that > > JavaScript needs to respond to and, more importantly, it avoids the UI > > rendering of those inconsistent states). > > > > > > > > - It simplifies the JavaScript coding (by not having to detect or > compensate > > for inconsistent states). > > > > > > > > > > > > Therefore, I propose a slight re-definition of resultIndex: > > > > > > > > "The resultIndex must be set to the lowest index in the resultHistory > > array that has changed. Entries at greater indexes in the resultHistory > > array (if any) may also have changed." > > > > > > > > followed by the rest of the existing definition of resultIndex: > > > > > > > > "The resultIndex may refer to a previous occupied array index from a > > previous SpeechRecognitionResultEvent. When this is the case this new > result > > overwrites the earlier result and is a more accurate result; however, > when > > this is the case the previous value must not have been a final result. > When > > continuous was false, the resultIndex must always be 0." > > > > > > > > > > > > And a slight re-definition of resultHistory: > > > > > > > > "The array of all of the recognition results that have been returned > as > > part of this session. All entries for indexes less than resultIndex must > be > > identical to the array that was present when the last > > SpeechRecognitionResultEvent was raised. > > > > > > > > > > > > To illustrate, the fourth line in our example above would return the > > SpeechRecognitionResultEvent with > > > > resultIndex = 0 > > > > resultHistory[0] = "To be", final = true, resultHistory[1] = " or > not to > > be there", final = false > > > > > > > > and the the fifth line in our example above would return the > > SpeechRecognitionResultEvent with > > > > resultIndex = 1 > > > > resultHistory[0] = "To be", final = true, resultHistory[1] = " or > not to > > be", final = true, resultHistory[2] = " that is", final = false > > > > > > > > > > > > /Glen Shires > > > > > > > > > > > > > > > > >
Received on Friday, 24 August 2012 17:04:56 UTC