Re: Concatenating transcript results from Glen Shires on 2012-08-31 (public-speech-api@w3.org from August 2012)

From: Glen Shires <gshires@google.com>
Date: Thu, 30 Aug 2012 17:51:13 -0700
To: Satish S <satish@google.com>
Cc: "Young, Milan" <Milan.Young@nuance.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <CAEE5bcj0KjfDkO42VsMpU-Jb7QE2fZfMyrjLXV+netqL15CB4g@mail.gmail.com>
If this is an optional flag that we add in the future, I strongly believe
the default should be true.  (That is, until we add this feature, proper
whitespace must be inserted by the speech recognizer.)

If a user is searching for a consecutive key-words such as "peanut butter",
there is no guarantee that the they will be returned in the same final
result. For example:

result[0].transcript = "I'd like a peanut"
result[1].transcript = "butter sandwich."

While there's various algorithms that might be used to find the consecutive
key-words, perhaps the easiest is to concatenate the results together with
a space input between, and search for "peanut butter". So for this use
case, it would be simpler if the speech-recognizer had returned the results
with proper white-spacing.

But frankly, I think the complexity of writing a JavaScript algorithm that
knows how to insert proper whitespaces - and works on a wide variety of
international languages, far outweighs any minor simplification of scanning
for keywords by ignoring leading/trailing whitespaces. I believe there will
be many applications that do use dictation to generate emails, documents,
product reviews, etc. So I believe we must ensure that authoring a
dictation app should not be more difficult than it needs to be.

/Glen Shires


On Thu, Aug 30, 2012 at 4:04 PM, Satish S <satish@google.com> wrote:

> Stripping whitespace is something that almost every app that doesn't use
> the API for dictation would need. To me this looks like an optional
> feature, something which gets turned on based on a flag such as
> "SpeechRecognition.autoWhiteSpace" that the developer would set if they
> want it.. and as such it could be added in a future revision of the API if
> we see developers asking for it.
>
> Cheers
> Satish
>
>
>
> On Thu, Aug 30, 2012 at 9:48 PM, Glen Shires <gshires@google.com> wrote:
>
>> Inserting whitespace is non-trivial, particularly when considering
>> punctuation and internationalization. Some punctuation is placed before the
>> whitespace, others after. Some languages don't use whitespace. I'd prefer
>> to avoid placing this burden on the JavaScript author.  Speech recognition
>> engines already contain this logic.
>>
>> Conversely, stripping leading and trailing whitespace is trivial, as is
>> writing a comparison routine that ignores whitespace.
>>
>>
>> On Thu, Aug 30, 2012 at 1:35 PM, Young, Milan <Milan.Young@nuance.com>wrote:
>>
>>>  I prefer Satish’s suggestion.  If the web author needs to concatenate,
>>> sandwiching in some whitespace seems like a trivial adjustment.****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> *From:* Satish S [mailto:satish@google.com]
>>> *Sent:* Thursday, August 30, 2012 1:28 PM
>>> *To:* Glen Shires
>>> *Cc:* public-speech-api@w3.org
>>> *Subject:* Re: Concatenating transcript results****
>>>
>>> ** **
>>>
>>> We could also say the transcript should not include leading or trailing
>>> spaces, so the web app should always use a whitespace if it needs to
>>> concatenate.  This would work better for apps that check the transcript
>>> with known words (e.g. command and control) instead of having to
>>> append/prepend whitespaces to their string literals. Also depending on the
>>> language of the recognized text whitespace may not be appropriate (e.g. CJK
>>> don't use white spaces).****
>>>
>>>
>>> Cheers
>>> Satish
>>>
>>> ****
>>>
>>> On Thu, Aug 30, 2012 at 6:11 PM, Glen Shires <gshires@google.com> wrote:
>>> ****
>>>
>>> If there's no disagreement by the end of the week I'll add it to the
>>> spec...****
>>>
>>> ** **
>>>
>>> On Wed, Aug 29, 2012 at 9:36 AM, Glen Shires <gshires@google.com> wrote:
>>> ****
>>>
>>> I propose adding the following sentence to the definition
>>> of SpeechRecognitionAlternative.transcript to make it clear that a
>>> JavaScript author can simply concatenate SpeechRecognitionResults without
>>> the author having to worry about where/when to add whitespace.****
>>>
>>> ** **
>>>
>>> "For continuous recognition, whitespace MUST be included in the
>>> transcript, including leading or trailing whitespace, as necessary such
>>> that concatenation of consecutive SpeechRecognitionResults produces a
>>> proper transcript of the session."****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>
>>
>
Received on Friday, 31 August 2012 00:52:21 UTC