- From: Satish S <satish@google.com>
- Date: Fri, 31 Aug 2012 18:22:56 +0100
- To: Glen Shires <gshires@google.com>
- Cc: "Young, Milan" <Milan.Young@nuance.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
- Message-ID: <CAHZf7RnAsFc8HEYJANyOEGKYhG-3+Tc68NpGsdGX8uBHjoaE+g@mail.gmail.com>
Glen and I talked about this later and I also looked at other speech recognition APIs where appending a white space in some form is the norm (either as flags or as a space character). Witih those in mind Glen's suggestion of making the flag default to true makes sense and in v1 we could leave out the flag. So I am ok with the original proposal of appending a white space in the transcript for languages where it is applicable and if we get developer feedback that a flag to turn it off is necessary it can be added in a future revision of the spec proposal. Cheers Satish On Fri, Aug 31, 2012 at 11:01 AM, Satish S <satish@google.com> wrote: > Looking at it from another angle - if there was automatic binding to a > HTML element and the spoken text was entered into the element in by the > browser, then adding spaces automatically is the right thing to do. The > equivalent for this is the keyboard IME on mobile phones where tapping on a > word in the suggestion bar enters the word and a space with it. But the > events that get dispatched to JS should not contain spaces appended or > prepended. > > Some web apps may want to also offer correction of a word/phrase based on > the list of hypotheses in the results, so when the user taps/clicks on the > phrase it may offer a drop down list of suggestions. If we add spaces > before or after the phrase then the UI would include those in the highlight > instead of just the text, so developers may end up stripping off the space > to show a better UI. This feels like working against the framework and > something we should avoid. > > Perhaps we could look at it post v1 of the spec based on developer > feedback? > > Cheers > Satish > > > > On Fri, Aug 31, 2012 at 1:51 AM, Glen Shires <gshires@google.com> wrote: > >> If this is an optional flag that we add in the future, I strongly believe >> the default should be true. (That is, until we add this feature, proper >> whitespace must be inserted by the speech recognizer.) >> >> If a user is searching for a consecutive key-words such as "peanut >> butter", there is no guarantee that the they will be returned in the same >> final result. For example: >> >> result[0].transcript = "I'd like a peanut" >> result[1].transcript = "butter sandwich." >> >> While there's various algorithms that might be used to find the >> consecutive key-words, perhaps the easiest is to concatenate the results >> together with a space input between, and search for "peanut butter". So for >> this use case, it would be simpler if the speech-recognizer had returned >> the results with proper white-spacing. >> >> But frankly, I think the complexity of writing a JavaScript algorithm >> that knows how to insert proper whitespaces - and works on a wide variety >> of international languages, far outweighs any minor simplification of >> scanning for keywords by ignoring leading/trailing whitespaces. I believe >> there will be many applications that do use dictation to generate emails, >> documents, product reviews, etc. So I believe we must ensure that authoring >> a dictation app should not be more difficult than it needs to be. >> >> /Glen Shires >> >> >> On Thu, Aug 30, 2012 at 4:04 PM, Satish S <satish@google.com> wrote: >> >>> Stripping whitespace is something that almost every app that doesn't use >>> the API for dictation would need. To me this looks like an optional >>> feature, something which gets turned on based on a flag such as >>> "SpeechRecognition.autoWhiteSpace" that the developer would set if they >>> want it.. and as such it could be added in a future revision of the API if >>> we see developers asking for it. >>> >>> Cheers >>> Satish >>> >>> >>> >>> On Thu, Aug 30, 2012 at 9:48 PM, Glen Shires <gshires@google.com> wrote: >>> >>>> Inserting whitespace is non-trivial, particularly when considering >>>> punctuation and internationalization. Some punctuation is placed before the >>>> whitespace, others after. Some languages don't use whitespace. I'd prefer >>>> to avoid placing this burden on the JavaScript author. Speech recognition >>>> engines already contain this logic. >>>> >>>> Conversely, stripping leading and trailing whitespace is trivial, as is >>>> writing a comparison routine that ignores whitespace. >>>> >>>> >>>> On Thu, Aug 30, 2012 at 1:35 PM, Young, Milan <Milan.Young@nuance.com>wrote: >>>> >>>>> I prefer Satish’s suggestion. If the web author needs to >>>>> concatenate, sandwiching in some whitespace seems like a trivial adjustment. >>>>> **** >>>>> >>>>> ** ** >>>>> >>>>> ** ** >>>>> >>>>> *From:* Satish S [mailto:satish@google.com] >>>>> *Sent:* Thursday, August 30, 2012 1:28 PM >>>>> *To:* Glen Shires >>>>> *Cc:* public-speech-api@w3.org >>>>> *Subject:* Re: Concatenating transcript results**** >>>>> >>>>> ** ** >>>>> >>>>> We could also say the transcript should not include leading or >>>>> trailing spaces, so the web app should always use a whitespace if it needs >>>>> to concatenate. This would work better for apps that check the transcript >>>>> with known words (e.g. command and control) instead of having to >>>>> append/prepend whitespaces to their string literals. Also depending on the >>>>> language of the recognized text whitespace may not be appropriate (e.g. CJK >>>>> don't use white spaces).**** >>>>> >>>>> >>>>> Cheers >>>>> Satish >>>>> >>>>> **** >>>>> >>>>> On Thu, Aug 30, 2012 at 6:11 PM, Glen Shires <gshires@google.com> >>>>> wrote:**** >>>>> >>>>> If there's no disagreement by the end of the week I'll add it to the >>>>> spec...**** >>>>> >>>>> ** ** >>>>> >>>>> On Wed, Aug 29, 2012 at 9:36 AM, Glen Shires <gshires@google.com> >>>>> wrote:**** >>>>> >>>>> I propose adding the following sentence to the definition >>>>> of SpeechRecognitionAlternative.transcript to make it clear that a >>>>> JavaScript author can simply concatenate SpeechRecognitionResults without >>>>> the author having to worry about where/when to add whitespace.**** >>>>> >>>>> ** ** >>>>> >>>>> "For continuous recognition, whitespace MUST be included in the >>>>> transcript, including leading or trailing whitespace, as necessary such >>>>> that concatenation of consecutive SpeechRecognitionResults produces a >>>>> proper transcript of the session."**** >>>>> >>>>> ** ** >>>>> >>>>> ** ** >>>>> >>>> >>>> >>> >> >
Received on Friday, 31 August 2012 17:23:25 UTC