- From: Deborah Dahl <dahl@conversational-technologies.com>
- Date: Thu, 23 Aug 2012 09:37:00 -0400
- To: "'Bjorn Bringert'" <bringert@google.com>, "'Glen Shires'" <gshires@google.com>
- Cc: "'Hans Wennborg'" <hwennborg@google.com>, "'Satish S'" <satish@google.com>, <public-speech-api@w3.org>
Comments inline. > -----Original Message----- > From: Bjorn Bringert [mailto:bringert@google.com] > Sent: Thursday, August 23, 2012 6:09 AM > To: Glen Shires > Cc: Deborah Dahl; Hans Wennborg; Satish S; public-speech-api@w3.org > Subject: Re: SpeechRecognitionAlternative.interpretation when > interpretation can't be provided > > Here's what I think we should say: > > 1. Implementations are required to support SRGS (both XML and ABNF). > Implementations may support other grammar formats. (Requiring SLM > support is tricky, since as far as I'm aware, there is no widely > accepted standard SLM format.) I agree. I also agree that we shouldn't require SLM support. I think SLM's should just be considered a case of "other grammar formats". > > 2. Implementations are required to support SISR in SRGS. > Implementations may support other tag formats in SRGS. Implementations > may support any tag formats in other grammar formats, but none are > required. > I agree. > 3. Implementations must return a non-empty string in "transcription" > when recognition is successful. > I agree, although we need to be explicit about what we mean by "successful". I think it means that a "result" event was fired, but possibly there are edge cases around "nomatch" that we should consider. Note that SRGS itself doesn't require anything at all to be returned, so we're adding a requirement in our spec that isn't actually in SRGS. That's not a problem, it's just something to be aware of. > 4. If the grammar is SRGS with SISR tags, and there is a SISR tag for > the grammar rule that matches the input, the "interpretation" field > must contain the result of evaluating the SISR tag. In other cases, > the "interpretation" field may contain a value, but is not required > to. > I agree, except I think that the interpretation field should always contain a value, which is the same as the transcript if there is no other interpretation. I think developers won't like having to add logic to check for null or undefined every time they request an interpretation. > > Open questions: > > A) What happens if the web app uses a grammar format that the > implementation doesn't support? > > B) What happens if the web app uses a tag format that the > implementation doesn't support? > Perhaps we could add a couple of error codes, e.g. "GRAMMAR_FORMAT_NOT_SUPPORTED" and "TAG_FORMAT_NOT_SUPPORTED". > > On Thu, Aug 23, 2012 at 8:47 AM, Glen Shires <gshires@google.com> wrote: > > Debbie, > > I agree with the need to support SLMs. This implies that, in some cases, the > > author may not specify semantic information, and thus there would not be > an > > interpretation. > > > > Under what circumstances (except error conditions) do you envision that a > > transcript would not be returned? > > > > /Glen Shires > > > > > > On Wed, Aug 22, 2012 at 6:08 AM, Deborah Dahl > > <dahl@conversational-technologies.com> wrote: > >> > >> Actually, Satish's comment made me think that we probably have a few > other > >> things to agree on before we decide what the default value of > >> "interpretation" should be, because we haven't settled on a lot of issues > >> about what is required and what is optional. > >> Satish's argument is only relevant if we require SRGS/SISR for grammars > >> and > >> semantic interpretation, but we actually don't require either of those > >> right > >> now, so it doesn't matter what they do as far as the current spec goes. > >> (Although it's worth noting that SRGS doesn't require anything to be > >> returned at all, even the transcript > >> http://www.w3.org/TR/speech-grammar/#S1.10). > >> So I think we first need to decide and explicitly state in the spec --- > >> > >> 1. what we want to say about grammar formats (which are > allowed/required, > >> or > >> is the grammar format open). It probably needs to be somewhat open > because > >> of SLM's. > >> 2. what we want to say about semantic tag formats (are proprietary > formats > >> allowed, is SISR required or is the semantic tag format just whatever the > >> grammar format uses) > >> 3. is "transcript" required? > >> 4. is "interpretation" required? > >> > >> Debbie > >> > >> > -----Original Message----- > >> > From: Hans Wennborg [mailto:hwennborg@google.com] > >> > Sent: Tuesday, August 21, 2012 12:50 PM > >> > To: Glen Shires > >> > Cc: Satish S; Deborah Dahl; Bjorn Bringert; public-speech-api@w3.org > >> > Subject: Re: SpeechRecognitionAlternative.interpretation when > >> > interpretation can't be provided > >> > > >> > Björn, Deborah, are you ok with this as well? I.e. that the spec > >> > shouldn't mandate a "default" value for the interpretation attribute, > >> > but rather return null when there is no interpretation? > >> > > >> > On Fri, Aug 17, 2012 at 6:32 PM, Glen Shires <gshires@google.com> > wrote: > >> > > I agree, return "null" (not "undefined") in such cases. > >> > > > >> > > > >> > > On Fri, Aug 17, 2012 at 7:41 AM, Satish S <satish@google.com> wrote: > >> > >> > >> > >> > I may have missed something, but I don’t see in the spec where it > >> says > >> > >> > that “interpretation” is optional. > >> > >> > >> > >> Developers specify the interpretation value with SISR and if they > >> > >> don't > >> > >> specify there is no 'default' interpretation available. In that sense > >> it is > >> > >> optional because grammars don't mandate it. So I think this API > >> shouldn't > >> > >> mandate providing a default value if the engine did not provide one, > >> and > >> > >> return null in such cases. > >> > >> > >> > >> > >> > >> > >> Cheers > >> > >> Satish > >> > >> > >> > >> > >> > >> > >> > >> On Fri, Aug 17, 2012 at 1:57 PM, Deborah Dahl > >> > >> <dahl@conversational-technologies.com> wrote: > >> > >>> > >> > >>> I may have missed something, but I don’t see in the spec where it > >> > >>> says > >> > >>> that “interpretation” is optional. > >> > >>> > >> > >>> From: Satish S [mailto:satish@google.com] > >> > >>> Sent: Thursday, August 16, 2012 7:38 PM > >> > >>> To: Deborah Dahl > >> > >>> Cc: Bjorn Bringert; Hans Wennborg; public-speech-api@w3.org > >> > >>> > >> > >>> > >> > >>> Subject: Re: SpeechRecognitionAlternative.interpretation when > >> > >>> interpretation can't be provided > >> > >>> > >> > >>> > >> > >>> > >> > >>> 'interpretation' is an optional attribute because engines are not > >> > >>> required to provide an interpretation on their own (unlike > >> 'transcript'). > >> > As > >> > >>> such I think it should return null when there isn't a value to be > >> returned > >> > >>> as that is the convention for optional attributes, not 'undefined' > >> > >>> or > >> a > >> > copy > >> > >>> of some other attribute. > >> > >>> > >> > >>> > >> > >>> > >> > >>> If an engine chooses to return the same value for 'transcript' and > >> > >>> 'interpretation' or do textnorm of the value and return in > >> 'interpretation' > >> > >>> that will be an implementation detail of the engine. But in the > >> absence > >> > of > >> > >>> any such value for 'interpretation' from the engine I think the UA > >> should > >> > >>> return null. > >> > >>> > >> > >>> > >> > >>> Cheers > >> > >>> Satish > >> > >>> > >> > >>> On Thu, Aug 16, 2012 at 2:52 PM, Deborah Dahl > >> > >>> <dahl@conversational-technologies.com> wrote: > >> > >>> > >> > >>> That's a good point. There are lots of use cases where some simple > >> > >>> normalization is extremely useful, as in your example, or collapsing > >> all > >> > the > >> > >>> ways that the user might say "yes" or "no". However, you could say > >> that > >> > once > >> > >>> the implementation has modified or normalized the transcript that > >> > means it > >> > >>> has some kind of interpretation, so putting a normalized value in > >> > >>> the > >> > >>> interpretation slot should be fine. Nothing says that the > >> "interpretation" > >> > >>> has to be a particularly fine-grained interpretation, or one with a > >> lot of > >> > >>> structure. > >> > >>> > >> > >>> > >> > >>> > >> > >>> > -----Original Message----- > >> > >>> > From: Bjorn Bringert [mailto:bringert@google.com] > >> > >>> > Sent: Thursday, August 16, 2012 9:09 AM > >> > >>> > To: Hans Wennborg > >> > >>> > Cc: Conversational; public-speech-api@w3.org > >> > >>> > Subject: Re: SpeechRecognitionAlternative.interpretation when > >> > >>> > interpretation can't be provided > >> > >>> > > >> > >>> > I'm not sure that it has to be that strict in requiring that the > >> value > >> > >>> > is the same as the "transcript" attribute. For example, an engine > >> > >>> > might return the words recognized in "transcript" and apply some > >> > extra > >> > >>> > textnorm to the text that it returns in "interpretation", e.g. > >> > >>> > converting digit words to digits ("three" -> "3"). Not sure if > >> that's > >> > >>> > useful though. > >> > >>> > > >> > >>> > On Thu, Aug 16, 2012 at 1:58 PM, Hans Wennborg > >> > >>> > <hwennborg@google.com> wrote: > >> > >>> > > Yes, the raw text is in the 'transcript' attribute. > >> > >>> > > > >> > >>> > > The description of 'interpretation' is currently: "The > >> interpretation > >> > >>> > > represents the semantic meaning from what the user said. This > >> > might > >> > >>> > > be > >> > >>> > > determined, for instance, through the SISR specification of > >> semantics > >> > >>> > > in a grammar." > >> > >>> > > > >> > >>> > > I propose that we change it to "The interpretation represents > >> > >>> > > the > >> > >>> > > semantic meaning from what the user said. This might be > >> > determined, > >> > >>> > > for instance, through the SISR specification of semantics in a > >> > >>> > > grammar. If no semantic meaning can be determined, the > attribute > >> > must > >> > >>> > > be a string with the same value as the 'transcript' attribute." > >> > >>> > > > >> > >>> > > Does that sound good to everyone? If there are no objections, > >> > >>> > > I'll > >> > >>> > > make the change to the draft next week. > >> > >>> > > > >> > >>> > > Thanks, > >> > >>> > > Hans > >> > >>> > > > >> > >>> > > On Wed, Aug 15, 2012 at 5:29 PM, Conversational > >> > >>> > > <dahl@conversational-technologies.com> wrote: > >> > >>> > >> I can't check the spec right now, but I assume there's already > >> > >>> > >> an > >> > >>> > >> attribute > >> > >>> > that currently is defined to contain the raw text. So I think we > >> could > >> > >>> > say that > >> > >>> > if there's no interpretation the value of the interpretation > >> attribute > >> > >>> > would be > >> > >>> > the same as the value of the "raw string" attribute, > >> > >>> > >> > >> > >>> > >> Sent from my iPhone > >> > >>> > >> > >> > >>> > >> On Aug 15, 2012, at 9:57 AM, Hans Wennborg > >> > <hwennborg@google.com> > >> > >>> > wrote: > >> > >>> > >> > >> > >>> > >>> OK, that would work I suppose. > >> > >>> > >>> > >> > >>> > >>> What would the spec text look like? Something like "[...] If > >> > >>> > >>> no > >> > >>> > >>> semantic meaning can be determined, the attribute will a > >> > >>> > >>> string > >> > >>> > >>> representing the raw words that the user spoke."? > >> > >>> > >>> > >> > >>> > >>> On Wed, Aug 15, 2012 at 2:24 PM, Bjorn Bringert > >> > >>> > <bringert@google.com> wrote: > >> > >>> > >>>> Yeah, that would be my preference too. > >> > >>> > >>>> > >> > >>> > >>>> On Wed, Aug 15, 2012 at 2:19 PM, Conversational > >> > >>> > >>>> <dahl@conversational-technologies.com> wrote: > >> > >>> > >>>>> If there isn't an interpretation I think it would make the > >> most > >> > >>> > >>>>> sense > >> > >>> > for the attribute to contain the literal string result. I believe > >> this > >> > >>> > is what > >> > >>> > happens in VoiceXML. > >> > >>> > >>>>> > >> > >>> > >>>>>> My question is: for implementations that cannot provide > an > >> > >>> > >>>>>> interpretation, what should the attribute's value be? > null? > >> > >>> > undefined? > >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > -- > >> > >>> > Bjorn Bringert > >> > >>> > Google UK Limited, Registered Office: Belgrave House, 76 > >> > >>> > Buckingham > >> > >>> > Palace Road, London, SW1W 9TQ > >> > >>> > Registered in England Number: 3977902 > >> > >>> > >> > >>> > >> > >>> > >> > >> > >> > >> > >> > > > >> > > > > > > -- > Bjorn Bringert > Google UK Limited, Registered Office: Belgrave House, 76 Buckingham > Palace Road, London, SW1W 9TQ > Registered in England Number: 3977902
Received on Thursday, 23 August 2012 13:37:45 UTC