Re: SpeechRecognitionAlternative.interpretation when interpretation can't be provided from Hans Wennborg on 2012-08-17 (public-speech-api@w3.org from August 2012)

From: Hans Wennborg <hwennborg@google.com>
Date: Fri, 17 Aug 2012 15:24:17 +0100
To: Deborah Dahl <dahl@conversational-technologies.com>
Cc: Satish S <satish@google.com>, Bjorn Bringert <bringert@google.com>, public-speech-api@w3.org
Message-ID: <CAB8jPhei8Ckffx_=Uo7srqBjAyzCz+6PiJ0xKSuZpNyzPoRoig@mail.gmail.com>
The spec currently doesn't say that, but I think it should.

I also think that Glen and Satish have provided good arguments for
making the value be 'undefined' if no interpretation can be provided.

Thanks,
Hans

On Fri, Aug 17, 2012 at 1:57 PM, Deborah Dahl
<dahl@conversational-technologies.com> wrote:
> I may have missed something, but I don’t see in the spec where it says that
> “interpretation” is optional.
>
> From: Satish S [mailto:satish@google.com]
> Sent: Thursday, August 16, 2012 7:38 PM
> To: Deborah Dahl
> Cc: Bjorn Bringert; Hans Wennborg; public-speech-api@w3.org
>
>
> Subject: Re: SpeechRecognitionAlternative.interpretation when interpretation
> can't be provided
>
>
>
> 'interpretation' is an optional attribute because engines are not required
> to provide an interpretation on their own (unlike 'transcript'). As such I
> think it should return null when there isn't a value to be returned as that
> is the convention for optional attributes, not 'undefined' or a copy of some
> other attribute.
>
>
>
> If an engine chooses to return the same value for 'transcript' and
> 'interpretation' or do textnorm of the value and return in 'interpretation'
> that will be an implementation detail of the engine. But in the absence of
> any such value for 'interpretation' from the engine I think the UA should
> return null.
>
>
> Cheers
> Satish
>
> On Thu, Aug 16, 2012 at 2:52 PM, Deborah Dahl
> <dahl@conversational-technologies.com> wrote:
>
> That's a good point. There are lots of use cases where some simple
> normalization is extremely useful, as in your example, or collapsing all the
> ways that the user might say "yes" or "no". However, you could say that once
> the implementation has modified or normalized the transcript that means it
> has some kind of interpretation, so putting a normalized value in the
> interpretation slot should be fine. Nothing says that the "interpretation"
> has to be a particularly fine-grained interpretation, or one with a lot of
> structure.
>
>
>
>> -----Original Message-----
>> From: Bjorn Bringert [mailto:bringert@google.com]
>> Sent: Thursday, August 16, 2012 9:09 AM
>> To: Hans Wennborg
>> Cc: Conversational; public-speech-api@w3.org
>> Subject: Re: SpeechRecognitionAlternative.interpretation when
>> interpretation can't be provided
>>
>> I'm not sure that it has to be that strict in requiring that the value
>> is the same as the "transcript" attribute. For example, an engine
>> might return the words recognized in "transcript" and apply some extra
>> textnorm to the text that it returns in "interpretation", e.g.
>> converting digit words to digits ("three" -> "3"). Not sure if that's
>> useful though.
>>
>> On Thu, Aug 16, 2012 at 1:58 PM, Hans Wennborg
>> <hwennborg@google.com> wrote:
>> > Yes, the raw text is in the 'transcript' attribute.
>> >
>> > The description of 'interpretation' is currently: "The interpretation
>> > represents the semantic meaning from what the user said. This might be
>> > determined, for instance, through the SISR specification of semantics
>> > in a grammar."
>> >
>> > I propose that we change it to "The interpretation represents the
>> > semantic meaning from what the user said. This might be determined,
>> > for instance, through the SISR specification of semantics in a
>> > grammar. If no semantic meaning can be determined, the attribute must
>> > be a string with the same value as the 'transcript' attribute."
>> >
>> > Does that sound good to everyone? If there are no objections, I'll
>> > make the change to the draft next week.
>> >
>> > Thanks,
>> > Hans
>> >
>> > On Wed, Aug 15, 2012 at 5:29 PM, Conversational
>> > <dahl@conversational-technologies.com> wrote:
>> >> I can't check the spec right now, but I assume there's already an
>> >> attribute
>> that currently is defined to contain the raw text. So I think we could say
>> that
>> if there's no interpretation the value of the interpretation attribute
>> would be
>> the same as the value of the "raw string" attribute,
>> >>
>> >> Sent from my iPhone
>> >>
>> >> On Aug 15, 2012, at 9:57 AM, Hans Wennborg <hwennborg@google.com>
>> wrote:
>> >>
>> >>> OK, that would work I suppose.
>> >>>
>> >>> What would the spec text look like? Something like "[...] If no
>> >>> semantic meaning can be determined, the attribute will a string
>> >>> representing the raw words that the user spoke."?
>> >>>
>> >>> On Wed, Aug 15, 2012 at 2:24 PM, Bjorn Bringert
>> <bringert@google.com> wrote:
>> >>>> Yeah, that would be my preference too.
>> >>>>
>> >>>> On Wed, Aug 15, 2012 at 2:19 PM, Conversational
>> >>>> <dahl@conversational-technologies.com> wrote:
>> >>>>> If there isn't an interpretation I think it would make the most
>> >>>>> sense
>> for the attribute to contain the literal string result. I believe this is
>> what
>> happens in VoiceXML.
>> >>>>>
>> >>>>>> My question is: for implementations that cannot provide an
>> >>>>>> interpretation, what should the attribute's value be? null?
>> undefined?
>>
>>
>>
>> --
>> Bjorn Bringert
>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
>> Palace Road, London, SW1W 9TQ
>> Registered in England Number: 3977902
>
>
>
Received on Friday, 17 August 2012 14:25:05 UTC