Re: Concatenating transcript results from Satish S on 2012-09-02 (public-speech-api@w3.org from September 2012)

From: Satish S <satish@google.com>
Date: Sun, 2 Sep 2012 22:22:49 +0100
To: Glen Shires <gshires@google.com>
Cc: "Young, Milan" <Milan.Young@nuance.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <CAHZf7RmKAGmq_ZsvSJpbgM_ZZNT4AGaRqz+y96tZRxcsGqycRA@mail.gmail.com>
Looks good

Cheers
Satish


On Sat, Sep 1, 2012 at 3:48 PM, Glen Shires <gshires@google.com> wrote:

> Wonderful, it seems we're all in agreement. That a JavaScript author can
> simply concatenate SpeechRecognitionResults to create a proper transcript.
> That the author does NOT need to add additional whitespace. That this
> simple concatenation works for all languages, including compound words (no
> edge cases) and CJK. Also that this would continue to be the default
> behavior if we do choose to add a flag for alternative behavior in the
> future.
>
> To make this more clear, I've slightly re-worded my proposed text for the
> spec as follows. If there's no disagreement, I'll add this to the spec on
> Tuesday.
>
> "For continuous recognition, leading or trailing whitespace MUST be
> included where necessary such that concatenation of consecutive
> SpeechRecognitionResults produces a proper transcript of the session."
>
>
> On Fri, Aug 31, 2012 at 5:39 PM, Young, Milan <Milan.Young@nuance.com>wrote:
>
>>  I’m uncomfortable with language-specific behavior.  That might become a
>> mess if one wanted to write a multi-lingual page.****
>>
>> ** **
>>
>> If we’re really intent upon doing this, Glen’s suggestion of a flag that
>> defaults to true seems like the best route.  That way English and other
>> language users can easily disable the behavior if they choose.****
>>
>> ** **
>>
>> Also, as Jerry pointed out in a fork of this thread, there are some
>> English edge cases where engine-driven whitespace may make sense.****
>>
>> ** **
>>
>> Thanks****
>>
>> ** **
>>
>> *From:* Satish S [mailto:satish@google.com]
>> *Sent:* Friday, August 31, 2012 10:23 AM
>> *To:* Glen Shires
>> *Cc:* Young, Milan; public-speech-api@w3.org
>>
>> *Subject:* Re: Concatenating transcript results****
>>
>>  ** **
>>
>> Glen and I talked about this later and I also looked at other speech
>> recognition APIs where appending a white space in some form is the norm
>> (either as flags or as a space character). Witih those in mind Glen's
>> suggestion of making the flag default to true makes sense and in v1 we
>> could leave out the flag. So I am ok with the original proposal of
>> appending a white space in the transcript for languages where it is
>> applicable and if we get developer feedback that a flag to turn it off is
>> necessary it can be added in a future revision of the spec proposal.****
>>
>>
>> Cheers
>> Satish
>>
>> ****
>>
>> On Fri, Aug 31, 2012 at 11:01 AM, Satish S <satish@google.com> wrote:****
>>
>> Looking at it from another angle - if there was automatic binding to a
>> HTML element and the spoken text was entered into the element in by the
>> browser, then adding spaces automatically is the right thing to do. The
>> equivalent for this is the keyboard IME on mobile phones where tapping on a
>> word in the suggestion bar enters the word and a space with it. But the
>> events that get dispatched to JS should not contain spaces appended or
>> prepended.****
>>
>> ** **
>>
>> Some web apps may want to also offer correction of a word/phrase based on
>> the list of hypotheses in the results, so when the user taps/clicks on the
>> phrase it may offer a drop down list of suggestions. If we add spaces
>> before or after the phrase then the UI would include those in the highlight
>> instead of just the text, so developers may end up stripping off the space
>> to show a better UI. This feels like working against the framework and
>> something we should avoid.****
>>
>> ** **
>>
>> Perhaps we could look at it post v1 of the spec based on developer
>> feedback?****
>>
>> ** **
>>
>> Cheers
>> Satish****
>>
>>
>>
>> ****
>>
>> On Fri, Aug 31, 2012 at 1:51 AM, Glen Shires <gshires@google.com> wrote:*
>> ***
>>
>> If this is an optional flag that we add in the future, I strongly believe
>> the default should be true.  (That is, until we add this feature, proper
>> whitespace must be inserted by the speech recognizer.)****
>>
>> ** **
>>
>> If a user is searching for a consecutive key-words such as "peanut
>> butter", there is no guarantee that the they will be returned in the same
>> final result. For example:****
>>
>> ** **
>>
>> result[0].transcript = "I'd like a peanut"****
>>
>> result[1].transcript = "butter sandwich."****
>>
>> ** **
>>
>> While there's various algorithms that might be used to find the
>> consecutive key-words, perhaps the easiest is to concatenate the results
>> together with a space input between, and search for "peanut butter". So for
>> this use case, it would be simpler if the speech-recognizer had returned
>> the results with proper white-spacing.****
>>
>> ** **
>>
>> But frankly, I think the complexity of writing a JavaScript algorithm
>> that knows how to insert proper whitespaces - and works on a wide variety
>> of international languages, far outweighs any minor simplification of
>> scanning for keywords by ignoring leading/trailing whitespaces. I believe
>> there will be many applications that do use dictation to generate emails,
>> documents, product reviews, etc. So I believe we must ensure that authoring
>> a dictation app should not be more difficult than it needs to be.****
>>
>> ** **
>>
>> /Glen Shires****
>>
>> ** **
>>
>> ** **
>>
>> On Thu, Aug 30, 2012 at 4:04 PM, Satish S <satish@google.com> wrote:****
>>
>> Stripping whitespace is something that almost every app that doesn't use
>> the API for dictation would need. To me this looks like an optional
>> feature, something which gets turned on based on a flag such as
>> "SpeechRecognition.autoWhiteSpace" that the developer would set if they
>> want it.. and as such it could be added in a future revision of the API if
>> we see developers asking for it.****
>>
>>
>> Cheers
>> Satish****
>>
>>
>>
>> ****
>>
>> On Thu, Aug 30, 2012 at 9:48 PM, Glen Shires <gshires@google.com> wrote:*
>> ***
>>
>> Inserting whitespace is non-trivial, particularly when considering
>> punctuation and internationalization. Some punctuation is placed before the
>> whitespace, others after. Some languages don't use whitespace. I'd prefer
>> to avoid placing this burden on the JavaScript author.  Speech recognition
>> engines already contain this logic.****
>>
>> ** **
>>
>> Conversely, stripping leading and trailing whitespace is trivial, as is
>> writing a comparison routine that ignores whitespace.****
>>
>> ** **
>>
>> On Thu, Aug 30, 2012 at 1:35 PM, Young, Milan <Milan.Young@nuance.com>
>> wrote:****
>>
>> I prefer Satish’s suggestion.  If the web author needs to concatenate,
>> sandwiching in some whitespace seems like a trivial adjustment.****
>>
>>  ****
>>
>>  ****
>>
>> *From:* Satish S [mailto:satish@google.com]
>> *Sent:* Thursday, August 30, 2012 1:28 PM
>> *To:* Glen Shires
>> *Cc:* public-speech-api@w3.org
>> *Subject:* Re: Concatenating transcript results****
>>
>>  ****
>>
>> We could also say the transcript should not include leading or trailing
>> spaces, so the web app should always use a whitespace if it needs to
>> concatenate.  This would work better for apps that check the transcript
>> with known words (e.g. command and control) instead of having to
>> append/prepend whitespaces to their string literals. Also depending on the
>> language of the recognized text whitespace may not be appropriate (e.g. CJK
>> don't use white spaces).****
>>
>>
>> Cheers
>> Satish****
>>
>> On Thu, Aug 30, 2012 at 6:11 PM, Glen Shires <gshires@google.com> wrote:*
>> ***
>>
>> If there's no disagreement by the end of the week I'll add it to the
>> spec...****
>>
>>  ****
>>
>> On Wed, Aug 29, 2012 at 9:36 AM, Glen Shires <gshires@google.com> wrote:*
>> ***
>>
>> I propose adding the following sentence to the definition
>> of SpeechRecognitionAlternative.transcript to make it clear that a
>> JavaScript author can simply concatenate SpeechRecognitionResults without
>> the author having to worry about where/when to add whitespace.****
>>
>>  ****
>>
>> "For continuous recognition, whitespace MUST be included in the
>> transcript, including leading or trailing whitespace, as necessary such
>> that concatenation of consecutive SpeechRecognitionResults produces a
>> proper transcript of the session."****
>>
>>  ****
>>
>>  ****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>
>
Received on Sunday, 2 September 2012 21:23:18 UTC