Re: Concatenating transcript results from Glen Shires on 2012-09-04 (public-speech-api@w3.org from September 2012)

From: Glen Shires <gshires@google.com>
Date: Tue, 4 Sep 2012 10:59:39 -0700
To: Satish S <satish@google.com>, "Young, Milan" <Milan.Young@nuance.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <CAEE5bcgBFKOEawLTcx9ccT5-AG5VJnYKFw2BPVVT1WOxZ0en7w@mail.gmail.com>
I've updated the spec with this change:
https://dvcs.w3.org/hg/speech-api/rev/5e222a16f2fb

As always, the current draft spec is at:
http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

/Glen Shires

On Sun, Sep 2, 2012 at 2:22 PM, Satish S <satish@google.com> wrote:

> Looks good
>
> Cheers
> Satish
>
>
>
> On Sat, Sep 1, 2012 at 3:48 PM, Glen Shires <gshires@google.com> wrote:
>
>> Wonderful, it seems we're all in agreement. That a JavaScript author can
>> simply concatenate SpeechRecognitionResults to create a proper transcript.
>> That the author does NOT need to add additional whitespace. That this
>> simple concatenation works for all languages, including compound words (no
>> edge cases) and CJK. Also that this would continue to be the default
>> behavior if we do choose to add a flag for alternative behavior in the
>> future.
>>
>> To make this more clear, I've slightly re-worded my proposed text for the
>> spec as follows. If there's no disagreement, I'll add this to the spec on
>> Tuesday.
>>
>> "For continuous recognition, leading or trailing whitespace MUST be
>> included where necessary such that concatenation of consecutive
>> SpeechRecognitionResults produces a proper transcript of the session."
>>
>>
>> On Fri, Aug 31, 2012 at 5:39 PM, Young, Milan <Milan.Young@nuance.com>wrote:
>>
>>>  I’m uncomfortable with language-specific behavior.  That might become
>>> a mess if one wanted to write a multi-lingual page.****
>>>
>>> ** **
>>>
>>> If we’re really intent upon doing this, Glen’s suggestion of a flag that
>>> defaults to true seems like the best route.  That way English and other
>>> language users can easily disable the behavior if they choose.****
>>>
>>> ** **
>>>
>>> Also, as Jerry pointed out in a fork of this thread, there are some
>>> English edge cases where engine-driven whitespace may make sense.****
>>>
>>> ** **
>>>
>>> Thanks****
>>>
>>> ** **
>>>
>>> *From:* Satish S [mailto:satish@google.com]
>>> *Sent:* Friday, August 31, 2012 10:23 AM
>>> *To:* Glen Shires
>>> *Cc:* Young, Milan; public-speech-api@w3.org
>>>
>>> *Subject:* Re: Concatenating transcript results****
>>>
>>>  ** **
>>>
>>> Glen and I talked about this later and I also looked at other speech
>>> recognition APIs where appending a white space in some form is the norm
>>> (either as flags or as a space character). Witih those in mind Glen's
>>> suggestion of making the flag default to true makes sense and in v1 we
>>> could leave out the flag. So I am ok with the original proposal of
>>> appending a white space in the transcript for languages where it is
>>> applicable and if we get developer feedback that a flag to turn it off is
>>> necessary it can be added in a future revision of the spec proposal.****
>>>
>>>
>>> Cheers
>>> Satish
>>>
>>> ****
>>>
>>> On Fri, Aug 31, 2012 at 11:01 AM, Satish S <satish@google.com> wrote:***
>>> *
>>>
>>> Looking at it from another angle - if there was automatic binding to a
>>> HTML element and the spoken text was entered into the element in by the
>>> browser, then adding spaces automatically is the right thing to do. The
>>> equivalent for this is the keyboard IME on mobile phones where tapping on a
>>> word in the suggestion bar enters the word and a space with it. But the
>>> events that get dispatched to JS should not contain spaces appended or
>>> prepended.****
>>>
>>> ** **
>>>
>>> Some web apps may want to also offer correction of a word/phrase based
>>> on the list of hypotheses in the results, so when the user taps/clicks on
>>> the phrase it may offer a drop down list of suggestions. If we add spaces
>>> before or after the phrase then the UI would include those in the highlight
>>> instead of just the text, so developers may end up stripping off the space
>>> to show a better UI. This feels like working against the framework and
>>> something we should avoid.****
>>>
>>> ** **
>>>
>>> Perhaps we could look at it post v1 of the spec based on developer
>>> feedback?****
>>>
>>> ** **
>>>
>>> Cheers
>>> Satish****
>>>
>>>
>>>
>>> ****
>>>
>>> On Fri, Aug 31, 2012 at 1:51 AM, Glen Shires <gshires@google.com> wrote:
>>> ****
>>>
>>> If this is an optional flag that we add in the future, I strongly
>>> believe the default should be true.  (That is, until we add this feature,
>>> proper whitespace must be inserted by the speech recognizer.)****
>>>
>>> ** **
>>>
>>> If a user is searching for a consecutive key-words such as "peanut
>>> butter", there is no guarantee that the they will be returned in the same
>>> final result. For example:****
>>>
>>> ** **
>>>
>>> result[0].transcript = "I'd like a peanut"****
>>>
>>> result[1].transcript = "butter sandwich."****
>>>
>>> ** **
>>>
>>> While there's various algorithms that might be used to find the
>>> consecutive key-words, perhaps the easiest is to concatenate the results
>>> together with a space input between, and search for "peanut butter". So for
>>> this use case, it would be simpler if the speech-recognizer had returned
>>> the results with proper white-spacing.****
>>>
>>> ** **
>>>
>>> But frankly, I think the complexity of writing a JavaScript algorithm
>>> that knows how to insert proper whitespaces - and works on a wide variety
>>> of international languages, far outweighs any minor simplification of
>>> scanning for keywords by ignoring leading/trailing whitespaces. I believe
>>> there will be many applications that do use dictation to generate emails,
>>> documents, product reviews, etc. So I believe we must ensure that authoring
>>> a dictation app should not be more difficult than it needs to be.****
>>>
>>> ** **
>>>
>>> /Glen Shires****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> On Thu, Aug 30, 2012 at 4:04 PM, Satish S <satish@google.com> wrote:****
>>>
>>> Stripping whitespace is something that almost every app that doesn't use
>>> the API for dictation would need. To me this looks like an optional
>>> feature, something which gets turned on based on a flag such as
>>> "SpeechRecognition.autoWhiteSpace" that the developer would set if they
>>> want it.. and as such it could be added in a future revision of the API if
>>> we see developers asking for it.****
>>>
>>>
>>> Cheers
>>> Satish****
>>>
>>>
>>>
>>> ****
>>>
>>> On Thu, Aug 30, 2012 at 9:48 PM, Glen Shires <gshires@google.com> wrote:
>>> ****
>>>
>>> Inserting whitespace is non-trivial, particularly when considering
>>> punctuation and internationalization. Some punctuation is placed before the
>>> whitespace, others after. Some languages don't use whitespace. I'd prefer
>>> to avoid placing this burden on the JavaScript author.  Speech recognition
>>> engines already contain this logic.****
>>>
>>> ** **
>>>
>>> Conversely, stripping leading and trailing whitespace is trivial, as is
>>> writing a comparison routine that ignores whitespace.****
>>>
>>> ** **
>>>
>>> On Thu, Aug 30, 2012 at 1:35 PM, Young, Milan <Milan.Young@nuance.com>
>>> wrote:****
>>>
>>> I prefer Satish’s suggestion.  If the web author needs to concatenate,
>>> sandwiching in some whitespace seems like a trivial adjustment.****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>> *From:* Satish S [mailto:satish@google.com]
>>> *Sent:* Thursday, August 30, 2012 1:28 PM
>>> *To:* Glen Shires
>>> *Cc:* public-speech-api@w3.org
>>> *Subject:* Re: Concatenating transcript results****
>>>
>>>  ****
>>>
>>> We could also say the transcript should not include leading or trailing
>>> spaces, so the web app should always use a whitespace if it needs to
>>> concatenate.  This would work better for apps that check the transcript
>>> with known words (e.g. command and control) instead of having to
>>> append/prepend whitespaces to their string literals. Also depending on the
>>> language of the recognized text whitespace may not be appropriate (e.g. CJK
>>> don't use white spaces).****
>>>
>>>
>>> Cheers
>>> Satish****
>>>
>>> On Thu, Aug 30, 2012 at 6:11 PM, Glen Shires <gshires@google.com> wrote:
>>> ****
>>>
>>> If there's no disagreement by the end of the week I'll add it to the
>>> spec...****
>>>
>>>  ****
>>>
>>> On Wed, Aug 29, 2012 at 9:36 AM, Glen Shires <gshires@google.com> wrote:
>>> ****
>>>
>>> I propose adding the following sentence to the definition
>>> of SpeechRecognitionAlternative.transcript to make it clear that a
>>> JavaScript author can simply concatenate SpeechRecognitionResults without
>>> the author having to worry about where/when to add whitespace.****
>>>
>>>  ****
>>>
>>> "For continuous recognition, whitespace MUST be included in the
>>> transcript, including leading or trailing whitespace, as necessary such
>>> that concatenation of consecutive SpeechRecognitionResults produces a
>>> proper transcript of the session."****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> ** **
>>>
>>
>>
>
Received on Tuesday, 4 September 2012 18:00:52 UTC