- From: Bjorn Bringert <bringert@google.com>
- Date: Tue, 18 May 2010 09:27:45 +0100
On Mon, May 17, 2010 at 9:23 PM, Olli Pettay <Olli.Pettay at helsinki.fi> wrote: > On 5/17/10 6:55 PM, Bjorn Bringert wrote: > >> (Looks like half of the first question is missing, so I'm guessing >> here) If you are asking about when the web app loses focus (e.g. the >> user switches to a different tab or away from the browser), I think >> the recognition should be cancelled. I've added this to the spec. >> > > Oh, where did the rest of the question go. > > I was going to ask about alert()s. > What happens if alert() pops up while recognition is on? > Which events should fire and when? Hmm, good question. I think that either the recognition should be cancelled, like when the web app loses focus, or it should continue just as if there was no alert. Are there any browser implementation reasons to do one or the other? >> The grammar specifies the set of utterances that the speech recognizer >> should match against. The grammar may be annotated with SISR, which >> will be used to populate the 'interpretation' field in ListenResult. > > I know what grammars are :) Yeah, sorry about my silly reply there, I just wasn't sure exactly what you were asking. > What I meant that it is not very well specified that the result is actually > put to .value etc. Yes, good point. The alternatives would be to use either the 'utterance' or the 'interpretation' value from the most likely recognition result. If the grammar does not contain semantics, those are identical, so it doesn't matter in that case. If the developer has added semantics to the grammar, the interpretation is probably more interesting than the utterance. So my conclusion is that it would make most sense to store the interpretation in @value. I've updated the spec with better definitions of @value and @results. > And still, I'm still not quite sure what builtin:search actually > is. What kind of grammar would that be? How is that different from > builtin:dictation? To be useful, those should probably be large statistical language models (e.g. n-gram models) trained on different corpora. So "builtin:dictation" might be trained on a corpus containing e-mails, SMS messages and news text, and "builtin:search" might be trained on query strings from a search engine. I've updated the spec to make "builtin:search" optional, mapping to "builtin:dictation" if not implemented. The exact language matched by these models would be implementation dependent, and implementations may choose to be clever about them. For example by: - Dynamic tweaking for different web apps based on the user's previous inputs and the text contained in the web app. - Adding the names of all contacts from the user's address book to the dictation model. - Weighting place names based on geographic proximity (in an implementation that has access to the user's location). -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Tuesday, 18 May 2010 01:27:45 UTC