- From: Satish S <satish@google.com>
- Date: Sun, 2 Sep 2012 22:22:49 +0100
- To: Glen Shires <gshires@google.com>
- Cc: "Young, Milan" <Milan.Young@nuance.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
- Message-ID: <CAHZf7RmKAGmq_ZsvSJpbgM_ZZNT4AGaRqz+y96tZRxcsGqycRA@mail.gmail.com>
Looks good Cheers Satish On Sat, Sep 1, 2012 at 3:48 PM, Glen Shires <gshires@google.com> wrote: > Wonderful, it seems we're all in agreement. That a JavaScript author can > simply concatenate SpeechRecognitionResults to create a proper transcript. > That the author does NOT need to add additional whitespace. That this > simple concatenation works for all languages, including compound words (no > edge cases) and CJK. Also that this would continue to be the default > behavior if we do choose to add a flag for alternative behavior in the > future. > > To make this more clear, I've slightly re-worded my proposed text for the > spec as follows. If there's no disagreement, I'll add this to the spec on > Tuesday. > > "For continuous recognition, leading or trailing whitespace MUST be > included where necessary such that concatenation of consecutive > SpeechRecognitionResults produces a proper transcript of the session." > > > On Fri, Aug 31, 2012 at 5:39 PM, Young, Milan <Milan.Young@nuance.com>wrote: > >> I’m uncomfortable with language-specific behavior. That might become a >> mess if one wanted to write a multi-lingual page.**** >> >> ** ** >> >> If we’re really intent upon doing this, Glen’s suggestion of a flag that >> defaults to true seems like the best route. That way English and other >> language users can easily disable the behavior if they choose.**** >> >> ** ** >> >> Also, as Jerry pointed out in a fork of this thread, there are some >> English edge cases where engine-driven whitespace may make sense.**** >> >> ** ** >> >> Thanks**** >> >> ** ** >> >> *From:* Satish S [mailto:satish@google.com] >> *Sent:* Friday, August 31, 2012 10:23 AM >> *To:* Glen Shires >> *Cc:* Young, Milan; public-speech-api@w3.org >> >> *Subject:* Re: Concatenating transcript results**** >> >> ** ** >> >> Glen and I talked about this later and I also looked at other speech >> recognition APIs where appending a white space in some form is the norm >> (either as flags or as a space character). Witih those in mind Glen's >> suggestion of making the flag default to true makes sense and in v1 we >> could leave out the flag. So I am ok with the original proposal of >> appending a white space in the transcript for languages where it is >> applicable and if we get developer feedback that a flag to turn it off is >> necessary it can be added in a future revision of the spec proposal.**** >> >> >> Cheers >> Satish >> >> **** >> >> On Fri, Aug 31, 2012 at 11:01 AM, Satish S <satish@google.com> wrote:**** >> >> Looking at it from another angle - if there was automatic binding to a >> HTML element and the spoken text was entered into the element in by the >> browser, then adding spaces automatically is the right thing to do. The >> equivalent for this is the keyboard IME on mobile phones where tapping on a >> word in the suggestion bar enters the word and a space with it. But the >> events that get dispatched to JS should not contain spaces appended or >> prepended.**** >> >> ** ** >> >> Some web apps may want to also offer correction of a word/phrase based on >> the list of hypotheses in the results, so when the user taps/clicks on the >> phrase it may offer a drop down list of suggestions. If we add spaces >> before or after the phrase then the UI would include those in the highlight >> instead of just the text, so developers may end up stripping off the space >> to show a better UI. This feels like working against the framework and >> something we should avoid.**** >> >> ** ** >> >> Perhaps we could look at it post v1 of the spec based on developer >> feedback?**** >> >> ** ** >> >> Cheers >> Satish**** >> >> >> >> **** >> >> On Fri, Aug 31, 2012 at 1:51 AM, Glen Shires <gshires@google.com> wrote:* >> *** >> >> If this is an optional flag that we add in the future, I strongly believe >> the default should be true. (That is, until we add this feature, proper >> whitespace must be inserted by the speech recognizer.)**** >> >> ** ** >> >> If a user is searching for a consecutive key-words such as "peanut >> butter", there is no guarantee that the they will be returned in the same >> final result. For example:**** >> >> ** ** >> >> result[0].transcript = "I'd like a peanut"**** >> >> result[1].transcript = "butter sandwich."**** >> >> ** ** >> >> While there's various algorithms that might be used to find the >> consecutive key-words, perhaps the easiest is to concatenate the results >> together with a space input between, and search for "peanut butter". So for >> this use case, it would be simpler if the speech-recognizer had returned >> the results with proper white-spacing.**** >> >> ** ** >> >> But frankly, I think the complexity of writing a JavaScript algorithm >> that knows how to insert proper whitespaces - and works on a wide variety >> of international languages, far outweighs any minor simplification of >> scanning for keywords by ignoring leading/trailing whitespaces. I believe >> there will be many applications that do use dictation to generate emails, >> documents, product reviews, etc. So I believe we must ensure that authoring >> a dictation app should not be more difficult than it needs to be.**** >> >> ** ** >> >> /Glen Shires**** >> >> ** ** >> >> ** ** >> >> On Thu, Aug 30, 2012 at 4:04 PM, Satish S <satish@google.com> wrote:**** >> >> Stripping whitespace is something that almost every app that doesn't use >> the API for dictation would need. To me this looks like an optional >> feature, something which gets turned on based on a flag such as >> "SpeechRecognition.autoWhiteSpace" that the developer would set if they >> want it.. and as such it could be added in a future revision of the API if >> we see developers asking for it.**** >> >> >> Cheers >> Satish**** >> >> >> >> **** >> >> On Thu, Aug 30, 2012 at 9:48 PM, Glen Shires <gshires@google.com> wrote:* >> *** >> >> Inserting whitespace is non-trivial, particularly when considering >> punctuation and internationalization. Some punctuation is placed before the >> whitespace, others after. Some languages don't use whitespace. I'd prefer >> to avoid placing this burden on the JavaScript author. Speech recognition >> engines already contain this logic.**** >> >> ** ** >> >> Conversely, stripping leading and trailing whitespace is trivial, as is >> writing a comparison routine that ignores whitespace.**** >> >> ** ** >> >> On Thu, Aug 30, 2012 at 1:35 PM, Young, Milan <Milan.Young@nuance.com> >> wrote:**** >> >> I prefer Satish’s suggestion. If the web author needs to concatenate, >> sandwiching in some whitespace seems like a trivial adjustment.**** >> >> **** >> >> **** >> >> *From:* Satish S [mailto:satish@google.com] >> *Sent:* Thursday, August 30, 2012 1:28 PM >> *To:* Glen Shires >> *Cc:* public-speech-api@w3.org >> *Subject:* Re: Concatenating transcript results**** >> >> **** >> >> We could also say the transcript should not include leading or trailing >> spaces, so the web app should always use a whitespace if it needs to >> concatenate. This would work better for apps that check the transcript >> with known words (e.g. command and control) instead of having to >> append/prepend whitespaces to their string literals. Also depending on the >> language of the recognized text whitespace may not be appropriate (e.g. CJK >> don't use white spaces).**** >> >> >> Cheers >> Satish**** >> >> On Thu, Aug 30, 2012 at 6:11 PM, Glen Shires <gshires@google.com> wrote:* >> *** >> >> If there's no disagreement by the end of the week I'll add it to the >> spec...**** >> >> **** >> >> On Wed, Aug 29, 2012 at 9:36 AM, Glen Shires <gshires@google.com> wrote:* >> *** >> >> I propose adding the following sentence to the definition >> of SpeechRecognitionAlternative.transcript to make it clear that a >> JavaScript author can simply concatenate SpeechRecognitionResults without >> the author having to worry about where/when to add whitespace.**** >> >> **** >> >> "For continuous recognition, whitespace MUST be included in the >> transcript, including leading or trailing whitespace, as necessary such >> that concatenation of consecutive SpeechRecognitionResults produces a >> proper transcript of the session."**** >> >> **** >> >> **** >> >> ** ** >> >> ** ** >> >> ** ** >> >> ** ** >> >> ** ** >> > >
Received on Sunday, 2 September 2012 21:23:18 UTC