- From: Glen Shires <gshires@google.com>
- Date: Tue, 4 Sep 2012 10:59:39 -0700
- To: Satish S <satish@google.com>, "Young, Milan" <Milan.Young@nuance.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
- Message-ID: <CAEE5bcgBFKOEawLTcx9ccT5-AG5VJnYKFw2BPVVT1WOxZ0en7w@mail.gmail.com>
I've updated the spec with this change: https://dvcs.w3.org/hg/speech-api/rev/5e222a16f2fb As always, the current draft spec is at: http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html /Glen Shires On Sun, Sep 2, 2012 at 2:22 PM, Satish S <satish@google.com> wrote: > Looks good > > Cheers > Satish > > > > On Sat, Sep 1, 2012 at 3:48 PM, Glen Shires <gshires@google.com> wrote: > >> Wonderful, it seems we're all in agreement. That a JavaScript author can >> simply concatenate SpeechRecognitionResults to create a proper transcript. >> That the author does NOT need to add additional whitespace. That this >> simple concatenation works for all languages, including compound words (no >> edge cases) and CJK. Also that this would continue to be the default >> behavior if we do choose to add a flag for alternative behavior in the >> future. >> >> To make this more clear, I've slightly re-worded my proposed text for the >> spec as follows. If there's no disagreement, I'll add this to the spec on >> Tuesday. >> >> "For continuous recognition, leading or trailing whitespace MUST be >> included where necessary such that concatenation of consecutive >> SpeechRecognitionResults produces a proper transcript of the session." >> >> >> On Fri, Aug 31, 2012 at 5:39 PM, Young, Milan <Milan.Young@nuance.com>wrote: >> >>> I’m uncomfortable with language-specific behavior. That might become >>> a mess if one wanted to write a multi-lingual page.**** >>> >>> ** ** >>> >>> If we’re really intent upon doing this, Glen’s suggestion of a flag that >>> defaults to true seems like the best route. That way English and other >>> language users can easily disable the behavior if they choose.**** >>> >>> ** ** >>> >>> Also, as Jerry pointed out in a fork of this thread, there are some >>> English edge cases where engine-driven whitespace may make sense.**** >>> >>> ** ** >>> >>> Thanks**** >>> >>> ** ** >>> >>> *From:* Satish S [mailto:satish@google.com] >>> *Sent:* Friday, August 31, 2012 10:23 AM >>> *To:* Glen Shires >>> *Cc:* Young, Milan; public-speech-api@w3.org >>> >>> *Subject:* Re: Concatenating transcript results**** >>> >>> ** ** >>> >>> Glen and I talked about this later and I also looked at other speech >>> recognition APIs where appending a white space in some form is the norm >>> (either as flags or as a space character). Witih those in mind Glen's >>> suggestion of making the flag default to true makes sense and in v1 we >>> could leave out the flag. So I am ok with the original proposal of >>> appending a white space in the transcript for languages where it is >>> applicable and if we get developer feedback that a flag to turn it off is >>> necessary it can be added in a future revision of the spec proposal.**** >>> >>> >>> Cheers >>> Satish >>> >>> **** >>> >>> On Fri, Aug 31, 2012 at 11:01 AM, Satish S <satish@google.com> wrote:*** >>> * >>> >>> Looking at it from another angle - if there was automatic binding to a >>> HTML element and the spoken text was entered into the element in by the >>> browser, then adding spaces automatically is the right thing to do. The >>> equivalent for this is the keyboard IME on mobile phones where tapping on a >>> word in the suggestion bar enters the word and a space with it. But the >>> events that get dispatched to JS should not contain spaces appended or >>> prepended.**** >>> >>> ** ** >>> >>> Some web apps may want to also offer correction of a word/phrase based >>> on the list of hypotheses in the results, so when the user taps/clicks on >>> the phrase it may offer a drop down list of suggestions. If we add spaces >>> before or after the phrase then the UI would include those in the highlight >>> instead of just the text, so developers may end up stripping off the space >>> to show a better UI. This feels like working against the framework and >>> something we should avoid.**** >>> >>> ** ** >>> >>> Perhaps we could look at it post v1 of the spec based on developer >>> feedback?**** >>> >>> ** ** >>> >>> Cheers >>> Satish**** >>> >>> >>> >>> **** >>> >>> On Fri, Aug 31, 2012 at 1:51 AM, Glen Shires <gshires@google.com> wrote: >>> **** >>> >>> If this is an optional flag that we add in the future, I strongly >>> believe the default should be true. (That is, until we add this feature, >>> proper whitespace must be inserted by the speech recognizer.)**** >>> >>> ** ** >>> >>> If a user is searching for a consecutive key-words such as "peanut >>> butter", there is no guarantee that the they will be returned in the same >>> final result. For example:**** >>> >>> ** ** >>> >>> result[0].transcript = "I'd like a peanut"**** >>> >>> result[1].transcript = "butter sandwich."**** >>> >>> ** ** >>> >>> While there's various algorithms that might be used to find the >>> consecutive key-words, perhaps the easiest is to concatenate the results >>> together with a space input between, and search for "peanut butter". So for >>> this use case, it would be simpler if the speech-recognizer had returned >>> the results with proper white-spacing.**** >>> >>> ** ** >>> >>> But frankly, I think the complexity of writing a JavaScript algorithm >>> that knows how to insert proper whitespaces - and works on a wide variety >>> of international languages, far outweighs any minor simplification of >>> scanning for keywords by ignoring leading/trailing whitespaces. I believe >>> there will be many applications that do use dictation to generate emails, >>> documents, product reviews, etc. So I believe we must ensure that authoring >>> a dictation app should not be more difficult than it needs to be.**** >>> >>> ** ** >>> >>> /Glen Shires**** >>> >>> ** ** >>> >>> ** ** >>> >>> On Thu, Aug 30, 2012 at 4:04 PM, Satish S <satish@google.com> wrote:**** >>> >>> Stripping whitespace is something that almost every app that doesn't use >>> the API for dictation would need. To me this looks like an optional >>> feature, something which gets turned on based on a flag such as >>> "SpeechRecognition.autoWhiteSpace" that the developer would set if they >>> want it.. and as such it could be added in a future revision of the API if >>> we see developers asking for it.**** >>> >>> >>> Cheers >>> Satish**** >>> >>> >>> >>> **** >>> >>> On Thu, Aug 30, 2012 at 9:48 PM, Glen Shires <gshires@google.com> wrote: >>> **** >>> >>> Inserting whitespace is non-trivial, particularly when considering >>> punctuation and internationalization. Some punctuation is placed before the >>> whitespace, others after. Some languages don't use whitespace. I'd prefer >>> to avoid placing this burden on the JavaScript author. Speech recognition >>> engines already contain this logic.**** >>> >>> ** ** >>> >>> Conversely, stripping leading and trailing whitespace is trivial, as is >>> writing a comparison routine that ignores whitespace.**** >>> >>> ** ** >>> >>> On Thu, Aug 30, 2012 at 1:35 PM, Young, Milan <Milan.Young@nuance.com> >>> wrote:**** >>> >>> I prefer Satish’s suggestion. If the web author needs to concatenate, >>> sandwiching in some whitespace seems like a trivial adjustment.**** >>> >>> **** >>> >>> **** >>> >>> *From:* Satish S [mailto:satish@google.com] >>> *Sent:* Thursday, August 30, 2012 1:28 PM >>> *To:* Glen Shires >>> *Cc:* public-speech-api@w3.org >>> *Subject:* Re: Concatenating transcript results**** >>> >>> **** >>> >>> We could also say the transcript should not include leading or trailing >>> spaces, so the web app should always use a whitespace if it needs to >>> concatenate. This would work better for apps that check the transcript >>> with known words (e.g. command and control) instead of having to >>> append/prepend whitespaces to their string literals. Also depending on the >>> language of the recognized text whitespace may not be appropriate (e.g. CJK >>> don't use white spaces).**** >>> >>> >>> Cheers >>> Satish**** >>> >>> On Thu, Aug 30, 2012 at 6:11 PM, Glen Shires <gshires@google.com> wrote: >>> **** >>> >>> If there's no disagreement by the end of the week I'll add it to the >>> spec...**** >>> >>> **** >>> >>> On Wed, Aug 29, 2012 at 9:36 AM, Glen Shires <gshires@google.com> wrote: >>> **** >>> >>> I propose adding the following sentence to the definition >>> of SpeechRecognitionAlternative.transcript to make it clear that a >>> JavaScript author can simply concatenate SpeechRecognitionResults without >>> the author having to worry about where/when to add whitespace.**** >>> >>> **** >>> >>> "For continuous recognition, whitespace MUST be included in the >>> transcript, including leading or trailing whitespace, as necessary such >>> that concatenation of consecutive SpeechRecognitionResults produces a >>> proper transcript of the session."**** >>> >>> **** >>> >>> **** >>> >>> ** ** >>> >>> ** ** >>> >>> ** ** >>> >>> ** ** >>> >>> ** ** >>> >> >> >
Received on Tuesday, 4 September 2012 18:00:52 UTC