Re: SpeechRecognitionAlternative.interpretation when interpretation can't be provided from Bjorn Bringert on 2012-08-23 (public-speech-api@w3.org from August 2012)

From: Bjorn Bringert <bringert@google.com>
Date: Thu, 23 Aug 2012 11:08:31 +0100
To: Glen Shires <gshires@google.com>
Cc: Deborah Dahl <dahl@conversational-technologies.com>, Hans Wennborg <hwennborg@google.com>, Satish S <satish@google.com>, public-speech-api@w3.org
Message-ID: <CAJtyJaVbW8SQA1=0VOOfsn3V7e=uJ9FBahypjzAKM3_iLxV0Vg@mail.gmail.com>
Here's what I think we should say:

1. Implementations are required to support SRGS (both XML and ABNF).
Implementations may support other grammar formats. (Requiring SLM
support is tricky, since as far as I'm aware, there is no widely
accepted standard SLM format.)

2. Implementations are required to support SISR in SRGS.
Implementations may support other tag formats in SRGS. Implementations
may support any tag formats in other grammar formats, but none are
required.

3. Implementations must return a non-empty string in "transcription"
when recognition is successful.

4. If the grammar is SRGS with SISR tags, and there is a SISR tag for
the grammar rule that matches the input, the "interpretation" field
must contain the result of evaluating the SISR tag. In other cases,
the "interpretation" field may contain a value, but is not required
to.


Open questions:

A) What happens if the web app uses a grammar format that the
implementation doesn't support?

B) What happens if the web app uses a tag format that the
implementation doesn't support?


On Thu, Aug 23, 2012 at 8:47 AM, Glen Shires <gshires@google.com> wrote:
> Debbie,
> I agree with the need to support SLMs. This implies that, in some cases, the
> author may not specify semantic information, and thus there would not be an
> interpretation.
>
> Under what circumstances (except error conditions) do you envision that a
> transcript would not be returned?
>
> /Glen Shires
>
>
> On Wed, Aug 22, 2012 at 6:08 AM, Deborah Dahl
> <dahl@conversational-technologies.com> wrote:
>>
>> Actually, Satish's comment made me think that we probably have a few other
>> things to agree on before we decide what the default value of
>> "interpretation" should be, because we haven't settled on a lot of issues
>> about what is required and what is optional.
>> Satish's argument is only relevant if we require SRGS/SISR for grammars
>> and
>> semantic interpretation, but we actually don't require either of those
>> right
>> now, so it doesn't matter what they do as far as the current spec goes.
>> (Although it's worth noting that  SRGS doesn't require anything to be
>> returned at all, even the transcript
>> http://www.w3.org/TR/speech-grammar/#S1.10).
>> So I think we first need to decide and explicitly state in the spec ---
>>
>> 1. what we want to say about grammar formats (which are allowed/required,
>> or
>> is the grammar format open). It probably needs to be somewhat open because
>> of SLM's.
>> 2. what we want to say about semantic tag formats (are proprietary formats
>> allowed, is SISR required or is the semantic tag format just whatever the
>> grammar format uses)
>> 3. is "transcript" required?
>> 4. is "interpretation" required?
>>
>> Debbie
>>
>> > -----Original Message-----
>> > From: Hans Wennborg [mailto:hwennborg@google.com]
>> > Sent: Tuesday, August 21, 2012 12:50 PM
>> > To: Glen Shires
>> > Cc: Satish S; Deborah Dahl; Bjorn Bringert; public-speech-api@w3.org
>> > Subject: Re: SpeechRecognitionAlternative.interpretation when
>> > interpretation can't be provided
>> >
>> > Björn, Deborah, are you ok with this as well? I.e. that the spec
>> > shouldn't mandate a "default" value for the interpretation attribute,
>> > but rather return null when there is no interpretation?
>> >
>> > On Fri, Aug 17, 2012 at 6:32 PM, Glen Shires <gshires@google.com> wrote:
>> > > I agree, return "null" (not "undefined") in such cases.
>> > >
>> > >
>> > > On Fri, Aug 17, 2012 at 7:41 AM, Satish S <satish@google.com> wrote:
>> > >>
>> > >> > I may have missed something, but I don’t see in the spec where it
>> says
>> > >> > that “interpretation” is optional.
>> > >>
>> > >> Developers specify the interpretation value with SISR and if they
>> > >> don't
>> > >> specify there is no 'default' interpretation available. In that sense
>> it is
>> > >> optional because grammars don't mandate it. So I think this API
>> shouldn't
>> > >> mandate providing a default value if the engine did not provide one,
>> and
>> > >> return null in such cases.
>>
>>
>>
>> > >>
>> > >> Cheers
>> > >> Satish
>> > >>
>> > >>
>> > >>
>> > >> On Fri, Aug 17, 2012 at 1:57 PM, Deborah Dahl
>> > >> <dahl@conversational-technologies.com> wrote:
>> > >>>
>> > >>> I may have missed something, but I don’t see in the spec where it
>> > >>> says
>> > >>> that “interpretation” is optional.
>> > >>>
>> > >>> From: Satish S [mailto:satish@google.com]
>> > >>> Sent: Thursday, August 16, 2012 7:38 PM
>> > >>> To: Deborah Dahl
>> > >>> Cc: Bjorn Bringert; Hans Wennborg; public-speech-api@w3.org
>> > >>>
>> > >>>
>> > >>> Subject: Re: SpeechRecognitionAlternative.interpretation when
>> > >>> interpretation can't be provided
>> > >>>
>> > >>>
>> > >>>
>> > >>> 'interpretation' is an optional attribute because engines are not
>> > >>> required to provide an interpretation on their own (unlike
>> 'transcript').
>> > As
>> > >>> such I think it should return null when there isn't a value to be
>> returned
>> > >>> as that is the convention for optional attributes, not 'undefined'
>> > >>> or
>> a
>> > copy
>> > >>> of some other attribute.
>> > >>>
>> > >>>
>> > >>>
>> > >>> If an engine chooses to return the same value for 'transcript' and
>> > >>> 'interpretation' or do textnorm of the value and return in
>> 'interpretation'
>> > >>> that will be an implementation detail of the engine. But in the
>> absence
>> > of
>> > >>> any such value for 'interpretation' from the engine I think the UA
>> should
>> > >>> return null.
>> > >>>
>> > >>>
>> > >>> Cheers
>> > >>> Satish
>> > >>>
>> > >>> On Thu, Aug 16, 2012 at 2:52 PM, Deborah Dahl
>> > >>> <dahl@conversational-technologies.com> wrote:
>> > >>>
>> > >>> That's a good point. There are lots of use cases where some simple
>> > >>> normalization is extremely useful, as in your example, or collapsing
>> all
>> > the
>> > >>> ways that the user might say "yes" or "no". However, you could say
>> that
>> > once
>> > >>> the implementation has modified or normalized the transcript that
>> > means it
>> > >>> has some kind of interpretation, so putting a normalized value in
>> > >>> the
>> > >>> interpretation slot should be fine. Nothing says that the
>> "interpretation"
>> > >>> has to be a particularly fine-grained interpretation, or one with a
>> lot of
>> > >>> structure.
>> > >>>
>> > >>>
>> > >>>
>> > >>> > -----Original Message-----
>> > >>> > From: Bjorn Bringert [mailto:bringert@google.com]
>> > >>> > Sent: Thursday, August 16, 2012 9:09 AM
>> > >>> > To: Hans Wennborg
>> > >>> > Cc: Conversational; public-speech-api@w3.org
>> > >>> > Subject: Re: SpeechRecognitionAlternative.interpretation when
>> > >>> > interpretation can't be provided
>> > >>> >
>> > >>> > I'm not sure that it has to be that strict in requiring that the
>> value
>> > >>> > is the same as the "transcript" attribute. For example, an engine
>> > >>> > might return the words recognized in "transcript" and apply some
>> > extra
>> > >>> > textnorm to the text that it returns in "interpretation", e.g.
>> > >>> > converting digit words to digits ("three" -> "3"). Not sure if
>> that's
>> > >>> > useful though.
>> > >>> >
>> > >>> > On Thu, Aug 16, 2012 at 1:58 PM, Hans Wennborg
>> > >>> > <hwennborg@google.com> wrote:
>> > >>> > > Yes, the raw text is in the 'transcript' attribute.
>> > >>> > >
>> > >>> > > The description of 'interpretation' is currently: "The
>> interpretation
>> > >>> > > represents the semantic meaning from what the user said. This
>> > might
>> > >>> > > be
>> > >>> > > determined, for instance, through the SISR specification of
>> semantics
>> > >>> > > in a grammar."
>> > >>> > >
>> > >>> > > I propose that we change it to "The interpretation represents
>> > >>> > > the
>> > >>> > > semantic meaning from what the user said. This might be
>> > determined,
>> > >>> > > for instance, through the SISR specification of semantics in a
>> > >>> > > grammar. If no semantic meaning can be determined, the attribute
>> > must
>> > >>> > > be a string with the same value as the 'transcript' attribute."
>> > >>> > >
>> > >>> > > Does that sound good to everyone? If there are no objections,
>> > >>> > > I'll
>> > >>> > > make the change to the draft next week.
>> > >>> > >
>> > >>> > > Thanks,
>> > >>> > > Hans
>> > >>> > >
>> > >>> > > On Wed, Aug 15, 2012 at 5:29 PM, Conversational
>> > >>> > > <dahl@conversational-technologies.com> wrote:
>> > >>> > >> I can't check the spec right now, but I assume there's already
>> > >>> > >> an
>> > >>> > >> attribute
>> > >>> > that currently is defined to contain the raw text. So I think we
>> could
>> > >>> > say that
>> > >>> > if there's no interpretation the value of the interpretation
>> attribute
>> > >>> > would be
>> > >>> > the same as the value of the "raw string" attribute,
>> > >>> > >>
>> > >>> > >> Sent from my iPhone
>> > >>> > >>
>> > >>> > >> On Aug 15, 2012, at 9:57 AM, Hans Wennborg
>> > <hwennborg@google.com>
>> > >>> > wrote:
>> > >>> > >>
>> > >>> > >>> OK, that would work I suppose.
>> > >>> > >>>
>> > >>> > >>> What would the spec text look like? Something like "[...] If
>> > >>> > >>> no
>> > >>> > >>> semantic meaning can be determined, the attribute will a
>> > >>> > >>> string
>> > >>> > >>> representing the raw words that the user spoke."?
>> > >>> > >>>
>> > >>> > >>> On Wed, Aug 15, 2012 at 2:24 PM, Bjorn Bringert
>> > >>> > <bringert@google.com> wrote:
>> > >>> > >>>> Yeah, that would be my preference too.
>> > >>> > >>>>
>> > >>> > >>>> On Wed, Aug 15, 2012 at 2:19 PM, Conversational
>> > >>> > >>>> <dahl@conversational-technologies.com> wrote:
>> > >>> > >>>>> If there isn't an interpretation I think it would make the
>> most
>> > >>> > >>>>> sense
>> > >>> > for the attribute to contain the literal string result. I believe
>> this
>> > >>> > is what
>> > >>> > happens in VoiceXML.
>> > >>> > >>>>>
>> > >>> > >>>>>> My question is: for implementations that cannot provide an
>> > >>> > >>>>>> interpretation, what should the attribute's value be? null?
>> > >>> > undefined?
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> > --
>> > >>> > Bjorn Bringert
>> > >>> > Google UK Limited, Registered Office: Belgrave House, 76
>> > >>> > Buckingham
>> > >>> > Palace Road, London, SW1W 9TQ
>> > >>> > Registered in England Number: 3977902
>> > >>>
>> > >>>
>> > >>>
>> > >>
>> > >>
>> > >
>>
>



-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
Received on Thursday, 23 August 2012 10:09:00 UTC