Re: Concatenating transcript results

Glen and I talked about this later and I also looked at other speech
recognition APIs where appending a white space in some form is the norm
(either as flags or as a space character). Witih those in mind Glen's
suggestion of making the flag default to true makes sense and in v1 we
could leave out the flag. So I am ok with the original proposal of
appending a white space in the transcript for languages where it is
applicable and if we get developer feedback that a flag to turn it off is
necessary it can be added in a future revision of the spec proposal.

Cheers
Satish


On Fri, Aug 31, 2012 at 11:01 AM, Satish S <satish@google.com> wrote:

> Looking at it from another angle - if there was automatic binding to a
> HTML element and the spoken text was entered into the element in by the
> browser, then adding spaces automatically is the right thing to do. The
> equivalent for this is the keyboard IME on mobile phones where tapping on a
> word in the suggestion bar enters the word and a space with it. But the
> events that get dispatched to JS should not contain spaces appended or
> prepended.
>
> Some web apps may want to also offer correction of a word/phrase based on
> the list of hypotheses in the results, so when the user taps/clicks on the
> phrase it may offer a drop down list of suggestions. If we add spaces
> before or after the phrase then the UI would include those in the highlight
> instead of just the text, so developers may end up stripping off the space
> to show a better UI. This feels like working against the framework and
> something we should avoid.
>
> Perhaps we could look at it post v1 of the spec based on developer
> feedback?
>
> Cheers
> Satish
>
>
>
> On Fri, Aug 31, 2012 at 1:51 AM, Glen Shires <gshires@google.com> wrote:
>
>> If this is an optional flag that we add in the future, I strongly believe
>> the default should be true.  (That is, until we add this feature, proper
>> whitespace must be inserted by the speech recognizer.)
>>
>>  If a user is searching for a consecutive key-words such as "peanut
>> butter", there is no guarantee that the they will be returned in the same
>> final result. For example:
>>
>> result[0].transcript = "I'd like a peanut"
>> result[1].transcript = "butter sandwich."
>>
>> While there's various algorithms that might be used to find the
>> consecutive key-words, perhaps the easiest is to concatenate the results
>> together with a space input between, and search for "peanut butter". So for
>> this use case, it would be simpler if the speech-recognizer had returned
>> the results with proper white-spacing.
>>
>> But frankly, I think the complexity of writing a JavaScript algorithm
>> that knows how to insert proper whitespaces - and works on a wide variety
>> of international languages, far outweighs any minor simplification of
>> scanning for keywords by ignoring leading/trailing whitespaces. I believe
>> there will be many applications that do use dictation to generate emails,
>> documents, product reviews, etc. So I believe we must ensure that authoring
>> a dictation app should not be more difficult than it needs to be.
>>
>> /Glen Shires
>>
>>
>> On Thu, Aug 30, 2012 at 4:04 PM, Satish S <satish@google.com> wrote:
>>
>>> Stripping whitespace is something that almost every app that doesn't use
>>> the API for dictation would need. To me this looks like an optional
>>> feature, something which gets turned on based on a flag such as
>>> "SpeechRecognition.autoWhiteSpace" that the developer would set if they
>>> want it.. and as such it could be added in a future revision of the API if
>>> we see developers asking for it.
>>>
>>> Cheers
>>> Satish
>>>
>>>
>>>
>>> On Thu, Aug 30, 2012 at 9:48 PM, Glen Shires <gshires@google.com> wrote:
>>>
>>>> Inserting whitespace is non-trivial, particularly when considering
>>>> punctuation and internationalization. Some punctuation is placed before the
>>>> whitespace, others after. Some languages don't use whitespace. I'd prefer
>>>> to avoid placing this burden on the JavaScript author.  Speech recognition
>>>> engines already contain this logic.
>>>>
>>>> Conversely, stripping leading and trailing whitespace is trivial, as is
>>>> writing a comparison routine that ignores whitespace.
>>>>
>>>>
>>>> On Thu, Aug 30, 2012 at 1:35 PM, Young, Milan <Milan.Young@nuance.com>wrote:
>>>>
>>>>>  I prefer Satish’s suggestion.  If the web author needs to
>>>>> concatenate, sandwiching in some whitespace seems like a trivial adjustment.
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> *From:* Satish S [mailto:satish@google.com]
>>>>> *Sent:* Thursday, August 30, 2012 1:28 PM
>>>>> *To:* Glen Shires
>>>>> *Cc:* public-speech-api@w3.org
>>>>> *Subject:* Re: Concatenating transcript results****
>>>>>
>>>>> ** **
>>>>>
>>>>> We could also say the transcript should not include leading or
>>>>> trailing spaces, so the web app should always use a whitespace if it needs
>>>>> to concatenate.  This would work better for apps that check the transcript
>>>>> with known words (e.g. command and control) instead of having to
>>>>> append/prepend whitespaces to their string literals. Also depending on the
>>>>> language of the recognized text whitespace may not be appropriate (e.g. CJK
>>>>> don't use white spaces).****
>>>>>
>>>>>
>>>>> Cheers
>>>>> Satish
>>>>>
>>>>> ****
>>>>>
>>>>> On Thu, Aug 30, 2012 at 6:11 PM, Glen Shires <gshires@google.com>
>>>>> wrote:****
>>>>>
>>>>> If there's no disagreement by the end of the week I'll add it to the
>>>>> spec...****
>>>>>
>>>>> ** **
>>>>>
>>>>> On Wed, Aug 29, 2012 at 9:36 AM, Glen Shires <gshires@google.com>
>>>>> wrote:****
>>>>>
>>>>> I propose adding the following sentence to the definition
>>>>> of SpeechRecognitionAlternative.transcript to make it clear that a
>>>>> JavaScript author can simply concatenate SpeechRecognitionResults without
>>>>> the author having to worry about where/when to add whitespace.****
>>>>>
>>>>> ** **
>>>>>
>>>>> "For continuous recognition, whitespace MUST be included in the
>>>>> transcript, including leading or trailing whitespace, as necessary such
>>>>> that concatenation of consecutive SpeechRecognitionResults produces a
>>>>> proper transcript of the session."****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>
>>>>
>>>
>>
>

Received on Friday, 31 August 2012 17:23:25 UTC