RE: SpeechRecognitionAlternative.interpretation when interpretation can't be provided from Deborah Dahl on 2012-08-23 (public-speech-api@w3.org from August 2012)

From: Deborah Dahl <dahl@conversational-technologies.com>
Date: Thu, 23 Aug 2012 09:37:00 -0400
To: "'Bjorn Bringert'" <bringert@google.com>, "'Glen Shires'" <gshires@google.com>
Cc: "'Hans Wennborg'" <hwennborg@google.com>, "'Satish S'" <satish@google.com>, <public-speech-api@w3.org>
Message-ID: <00c601cd8134$601a9610$204fc230$@conversational-technologies.com>
Comments inline.

> -----Original Message-----
> From: Bjorn Bringert [mailto:bringert@google.com]
> Sent: Thursday, August 23, 2012 6:09 AM
> To: Glen Shires
> Cc: Deborah Dahl; Hans Wennborg; Satish S; public-speech-api@w3.org
> Subject: Re: SpeechRecognitionAlternative.interpretation when
> interpretation can't be provided
> 
> Here's what I think we should say:
> 
> 1. Implementations are required to support SRGS (both XML and ABNF).
> Implementations may support other grammar formats. (Requiring SLM
> support is tricky, since as far as I'm aware, there is no widely
> accepted standard SLM format.)

I agree. I also agree that we shouldn't require SLM support. I think SLM's should just be considered a case of "other grammar formats". 
> 
> 2. Implementations are required to support SISR in SRGS.
> Implementations may support other tag formats in SRGS. Implementations
> may support any tag formats in other grammar formats, but none are
> required.
> 

I agree. 
> 3. Implementations must return a non-empty string in "transcription"
> when recognition is successful.
> 
I agree, although we need to be explicit about what we mean by "successful". 
I think it means that a "result" event was fired, but possibly there are edge cases around "nomatch" that we should consider. 
Note that SRGS itself doesn't require anything at all to be returned, so we're adding a requirement in our spec that isn't actually in SRGS. That's not a problem, it's just something to be aware of. 

> 4. If the grammar is SRGS with SISR tags, and there is a SISR tag for
> the grammar rule that matches the input, the "interpretation" field
> must contain the result of evaluating the SISR tag. In other cases,
> the "interpretation" field may contain a value, but is not required
> to.
> 
I agree, except I think that the interpretation field should always contain a value, which is the same as the transcript if there is no other interpretation. 
I think developers won't like having to add logic to check for null or undefined every time they request an interpretation. 
> 
> Open questions:
> 
> A) What happens if the web app uses a grammar format that the
> implementation doesn't support?
> 
> B) What happens if the web app uses a tag format that the
> implementation doesn't support?
> 
Perhaps we could add a couple of error codes, e.g. "GRAMMAR_FORMAT_NOT_SUPPORTED" and 
"TAG_FORMAT_NOT_SUPPORTED". 
> 
> On Thu, Aug 23, 2012 at 8:47 AM, Glen Shires <gshires@google.com> wrote:
> > Debbie,
> > I agree with the need to support SLMs. This implies that, in some cases, the
> > author may not specify semantic information, and thus there would not be
> an
> > interpretation.
> >
> > Under what circumstances (except error conditions) do you envision that a
> > transcript would not be returned?
> >
> > /Glen Shires
> >
> >
> > On Wed, Aug 22, 2012 at 6:08 AM, Deborah Dahl
> > <dahl@conversational-technologies.com> wrote:
> >>
> >> Actually, Satish's comment made me think that we probably have a few
> other
> >> things to agree on before we decide what the default value of
> >> "interpretation" should be, because we haven't settled on a lot of issues
> >> about what is required and what is optional.
> >> Satish's argument is only relevant if we require SRGS/SISR for grammars
> >> and
> >> semantic interpretation, but we actually don't require either of those
> >> right
> >> now, so it doesn't matter what they do as far as the current spec goes.
> >> (Although it's worth noting that  SRGS doesn't require anything to be
> >> returned at all, even the transcript
> >> http://www.w3.org/TR/speech-grammar/#S1.10).
> >> So I think we first need to decide and explicitly state in the spec ---
> >>
> >> 1. what we want to say about grammar formats (which are
> allowed/required,
> >> or
> >> is the grammar format open). It probably needs to be somewhat open
> because
> >> of SLM's.
> >> 2. what we want to say about semantic tag formats (are proprietary
> formats
> >> allowed, is SISR required or is the semantic tag format just whatever the
> >> grammar format uses)
> >> 3. is "transcript" required?
> >> 4. is "interpretation" required?
> >>
> >> Debbie
> >>
> >> > -----Original Message-----
> >> > From: Hans Wennborg [mailto:hwennborg@google.com]
> >> > Sent: Tuesday, August 21, 2012 12:50 PM
> >> > To: Glen Shires
> >> > Cc: Satish S; Deborah Dahl; Bjorn Bringert; public-speech-api@w3.org
> >> > Subject: Re: SpeechRecognitionAlternative.interpretation when
> >> > interpretation can't be provided
> >> >
> >> > Björn, Deborah, are you ok with this as well? I.e. that the spec
> >> > shouldn't mandate a "default" value for the interpretation attribute,
> >> > but rather return null when there is no interpretation?
> >> >
> >> > On Fri, Aug 17, 2012 at 6:32 PM, Glen Shires <gshires@google.com>
> wrote:
> >> > > I agree, return "null" (not "undefined") in such cases.
> >> > >
> >> > >
> >> > > On Fri, Aug 17, 2012 at 7:41 AM, Satish S <satish@google.com> wrote:
> >> > >>
> >> > >> > I may have missed something, but I don’t see in the spec where it
> >> says
> >> > >> > that “interpretation” is optional.
> >> > >>
> >> > >> Developers specify the interpretation value with SISR and if they
> >> > >> don't
> >> > >> specify there is no 'default' interpretation available. In that sense
> >> it is
> >> > >> optional because grammars don't mandate it. So I think this API
> >> shouldn't
> >> > >> mandate providing a default value if the engine did not provide one,
> >> and
> >> > >> return null in such cases.
> >>
> >>
> >>
> >> > >>
> >> > >> Cheers
> >> > >> Satish
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Fri, Aug 17, 2012 at 1:57 PM, Deborah Dahl
> >> > >> <dahl@conversational-technologies.com> wrote:
> >> > >>>
> >> > >>> I may have missed something, but I don’t see in the spec where it
> >> > >>> says
> >> > >>> that “interpretation” is optional.
> >> > >>>
> >> > >>> From: Satish S [mailto:satish@google.com]
> >> > >>> Sent: Thursday, August 16, 2012 7:38 PM
> >> > >>> To: Deborah Dahl
> >> > >>> Cc: Bjorn Bringert; Hans Wennborg; public-speech-api@w3.org
> >> > >>>
> >> > >>>
> >> > >>> Subject: Re: SpeechRecognitionAlternative.interpretation when
> >> > >>> interpretation can't be provided
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> 'interpretation' is an optional attribute because engines are not
> >> > >>> required to provide an interpretation on their own (unlike
> >> 'transcript').
> >> > As
> >> > >>> such I think it should return null when there isn't a value to be
> >> returned
> >> > >>> as that is the convention for optional attributes, not 'undefined'
> >> > >>> or
> >> a
> >> > copy
> >> > >>> of some other attribute.
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> If an engine chooses to return the same value for 'transcript' and
> >> > >>> 'interpretation' or do textnorm of the value and return in
> >> 'interpretation'
> >> > >>> that will be an implementation detail of the engine. But in the
> >> absence
> >> > of
> >> > >>> any such value for 'interpretation' from the engine I think the UA
> >> should
> >> > >>> return null.
> >> > >>>
> >> > >>>
> >> > >>> Cheers
> >> > >>> Satish
> >> > >>>
> >> > >>> On Thu, Aug 16, 2012 at 2:52 PM, Deborah Dahl
> >> > >>> <dahl@conversational-technologies.com> wrote:
> >> > >>>
> >> > >>> That's a good point. There are lots of use cases where some simple
> >> > >>> normalization is extremely useful, as in your example, or collapsing
> >> all
> >> > the
> >> > >>> ways that the user might say "yes" or "no". However, you could say
> >> that
> >> > once
> >> > >>> the implementation has modified or normalized the transcript that
> >> > means it
> >> > >>> has some kind of interpretation, so putting a normalized value in
> >> > >>> the
> >> > >>> interpretation slot should be fine. Nothing says that the
> >> "interpretation"
> >> > >>> has to be a particularly fine-grained interpretation, or one with a
> >> lot of
> >> > >>> structure.
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> > -----Original Message-----
> >> > >>> > From: Bjorn Bringert [mailto:bringert@google.com]
> >> > >>> > Sent: Thursday, August 16, 2012 9:09 AM
> >> > >>> > To: Hans Wennborg
> >> > >>> > Cc: Conversational; public-speech-api@w3.org
> >> > >>> > Subject: Re: SpeechRecognitionAlternative.interpretation when
> >> > >>> > interpretation can't be provided
> >> > >>> >
> >> > >>> > I'm not sure that it has to be that strict in requiring that the
> >> value
> >> > >>> > is the same as the "transcript" attribute. For example, an engine
> >> > >>> > might return the words recognized in "transcript" and apply some
> >> > extra
> >> > >>> > textnorm to the text that it returns in "interpretation", e.g.
> >> > >>> > converting digit words to digits ("three" -> "3"). Not sure if
> >> that's
> >> > >>> > useful though.
> >> > >>> >
> >> > >>> > On Thu, Aug 16, 2012 at 1:58 PM, Hans Wennborg
> >> > >>> > <hwennborg@google.com> wrote:
> >> > >>> > > Yes, the raw text is in the 'transcript' attribute.
> >> > >>> > >
> >> > >>> > > The description of 'interpretation' is currently: "The
> >> interpretation
> >> > >>> > > represents the semantic meaning from what the user said. This
> >> > might
> >> > >>> > > be
> >> > >>> > > determined, for instance, through the SISR specification of
> >> semantics
> >> > >>> > > in a grammar."
> >> > >>> > >
> >> > >>> > > I propose that we change it to "The interpretation represents
> >> > >>> > > the
> >> > >>> > > semantic meaning from what the user said. This might be
> >> > determined,
> >> > >>> > > for instance, through the SISR specification of semantics in a
> >> > >>> > > grammar. If no semantic meaning can be determined, the
> attribute
> >> > must
> >> > >>> > > be a string with the same value as the 'transcript' attribute."
> >> > >>> > >
> >> > >>> > > Does that sound good to everyone? If there are no objections,
> >> > >>> > > I'll
> >> > >>> > > make the change to the draft next week.
> >> > >>> > >
> >> > >>> > > Thanks,
> >> > >>> > > Hans
> >> > >>> > >
> >> > >>> > > On Wed, Aug 15, 2012 at 5:29 PM, Conversational
> >> > >>> > > <dahl@conversational-technologies.com> wrote:
> >> > >>> > >> I can't check the spec right now, but I assume there's already
> >> > >>> > >> an
> >> > >>> > >> attribute
> >> > >>> > that currently is defined to contain the raw text. So I think we
> >> could
> >> > >>> > say that
> >> > >>> > if there's no interpretation the value of the interpretation
> >> attribute
> >> > >>> > would be
> >> > >>> > the same as the value of the "raw string" attribute,
> >> > >>> > >>
> >> > >>> > >> Sent from my iPhone
> >> > >>> > >>
> >> > >>> > >> On Aug 15, 2012, at 9:57 AM, Hans Wennborg
> >> > <hwennborg@google.com>
> >> > >>> > wrote:
> >> > >>> > >>
> >> > >>> > >>> OK, that would work I suppose.
> >> > >>> > >>>
> >> > >>> > >>> What would the spec text look like? Something like "[...] If
> >> > >>> > >>> no
> >> > >>> > >>> semantic meaning can be determined, the attribute will a
> >> > >>> > >>> string
> >> > >>> > >>> representing the raw words that the user spoke."?
> >> > >>> > >>>
> >> > >>> > >>> On Wed, Aug 15, 2012 at 2:24 PM, Bjorn Bringert
> >> > >>> > <bringert@google.com> wrote:
> >> > >>> > >>>> Yeah, that would be my preference too.
> >> > >>> > >>>>
> >> > >>> > >>>> On Wed, Aug 15, 2012 at 2:19 PM, Conversational
> >> > >>> > >>>> <dahl@conversational-technologies.com> wrote:
> >> > >>> > >>>>> If there isn't an interpretation I think it would make the
> >> most
> >> > >>> > >>>>> sense
> >> > >>> > for the attribute to contain the literal string result. I believe
> >> this
> >> > >>> > is what
> >> > >>> > happens in VoiceXML.
> >> > >>> > >>>>>
> >> > >>> > >>>>>> My question is: for implementations that cannot provide
> an
> >> > >>> > >>>>>> interpretation, what should the attribute's value be?
> null?
> >> > >>> > undefined?
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> > --
> >> > >>> > Bjorn Bringert
> >> > >>> > Google UK Limited, Registered Office: Belgrave House, 76
> >> > >>> > Buckingham
> >> > >>> > Palace Road, London, SW1W 9TQ
> >> > >>> > Registered in England Number: 3977902
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>
> >> > >>
> >> > >
> >>
> >
> 
> 
> 
> --
> Bjorn Bringert
> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
> Palace Road, London, SW1W 9TQ
> Registered in England Number: 3977902
Received on Thursday, 23 August 2012 13:37:45 UTC