RE: SpeechRecognitionAlternative.interpretation when interpretation can't be provided

Having a more specific error like “TAG_FORMAT_NOT_SUPPORTED” would be more
informative, but I think using BAD_GRAMMAR is ok. If so, the text should
probably say something like "There was an error in the speech recognition
grammar or semantic tags, or the grammar format or tag format is
unsupported." 

 

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Thursday, September 13, 2012 12:40 PM
To: Deborah Dahl
Cc: Jim Barnett; Hans Wennborg; Satish S; Bjorn Bringert;
public-speech-api@w3.org
Subject: Re: SpeechRecognitionAlternative.interpretation when interpretation
can't be provided

 

The spec already defines SpeechRecognitionError BAD_GRAMMAR.  I propose we
use this same error for bad tag formats, since they're so related (and in
fact there may be some edge-cases in which it's not clear whether the error
is parsed as a grammar error or a semantic tag error.)

 

The current definition in the spec for BAD_GRAMMAR is:

"There was an error in the speech recognition grammar."

 

I propose changing this to:

"There was an error in the speech recognition grammar or semantic tags."

 

/Glen Shires

 

 

On Thu, Sep 13, 2012 at 8:53 AM, Deborah Dahl
<dahl@conversational-technologies.com> wrote:

The example of the author supplying semantics that the recognizer can’t
interpret I think is Bjorn’s “open question B” in his email --

http://lists.w3.org/Archives/Public/public-speech-api/2012Aug/0071.html

 

I proposed that this situation should raise an error in this email, but I
don’t think there’s been any other discussion, so we should discuss this at
some point.

http://lists.w3.org/Archives/Public/public-speech-api/2012Aug/0072.html

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Wednesday, September 12, 2012 5:58 PM


To: Deborah Dahl
Cc: Jim Barnett; Hans Wennborg; Satish S; Bjorn Bringert;
public-speech-api@w3.org
Subject: Re: SpeechRecognitionAlternative.interpretation when interpretation
can't be provided

 

> ...any use cases where the interpretation is of interest to the developer
and  it’s not known whether the interpretation is an object or a string.
What would be an example of that? 

 

An example is: if the author supplies semantics that the recognizer can't
interpret, then the recognizer might return a normalized result.

 

> I also think that the third use case would be very rare, since it would
involve asking the user to make a decision about whether they want a
normalized or non-normalized version of the result, and it’s not clear when
the user would actually be interested in making that kind of choice.

 

If the user is shown alternatives, one option might be normalized. I
provided an example of this, where the non-normalized might be preferred by
the user.

 

   transcript: "Like I've done one million times before."

normalized: "Like I've done 1,000,000 times before."

 

I understand that this may be a rare use case, but regardless of that, I
still don't know of any use case in which returning a copy of the transcript
is preferable to null. 

 

I'd prefer that we put the specific behavior in the spec, but if all we can
agree on at this point is: “The group is currently discussing options for
the value of the interpretation attribute when no interpretation has been
returned by the recognizer. Current options are ‘null’ or a copy of the
transcript.”, then I will agree to that.

 

I too would like to hear others' opinions.

/Glen Shires

On Wed, Sep 12, 2012 at 2:16 PM, Deborah Dahl
<dahl@conversational-technologies.com> wrote:

I’m not sure I can think of any use cases where the interpretation is of
interest to the developer and  it’s not known whether the interpretation is
an object or a string. What would be an example of that? I also think that
the third use case would be very rare, since it would involve asking the
user to make a decision about whether they want a normalized or
non-normalized version of the result, and it’s not clear when the user would
actually be interested in making that kind of choice.

I think it would be good at this point to get some other opinions about
this. 

Also, in the interest of moving forward, I think it’s perfectly fine to have
language in the spec that just says “The group is currently discussing
options for the value of the interpretation attribute when no interpretation
has been returned by the recognizer. Current options are ‘null’ or a copy of
the transcript.” This may also serve to encourage external comments from
developers who have an opinion about this. 

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Wednesday, September 12, 2012 4:21 PM


To: Deborah Dahl
Cc: Jim Barnett; Hans Wennborg; Satish S; Bjorn Bringert;
public-speech-api@w3.org
Subject: Re: SpeechRecognitionAlternative.interpretation when interpretation
can't be provided

 

I disagree with the code [1] for this use case. Since the interpretation may
be a non-string object, good defensive coding practice is:

 

if (typeof(interpretation) == "string") {

  document.write(interpretation)

} else {

  document.write(transcript);

}

 

Thus, for this use case it doesn't matter. The code is identical for either
definition of what the interpretation attributes returns when there is no
interpretation. (That is, whether interpretation is defined to return null
or to returns a copy of transcript.)

 

In contrast, [2] shows a use case where it does matter, the code is simpler
and less error-prone if the interpretation attributes returns null when
there is no interpretation.

 

Below a third use case where it also matters. Since interpretation may
return a normalized string, an author may wish to show both the normalized
string and the transcript string to the user, and let them choose which one
to use.  For example:

 

   interpretation: "Like I've done 1,000,000 times before."

   transcript: "Like I've done one million times before."

 

(The author might also add transcript alternatives to this choice list, but
I'll omit that to keep the example simple.)

 

For the option where interpretation returns a copy of transcript when there
is no interpretation:

 

var choices;

if (typeof(interpretation) == "string" && interpretation != transcript) {

  choices.push(interpretation);

}

choices.push(transcript);

if (choices.length > 1) {

  AskUserToDisambiguate(choices);

}

 

 

For the option where interpretation returns a null when there is no
interpretation:

 

var choices;

if (typeof(interpretation) == "string") {

  choices.push(interpretation);

}

choices.push(transcript);

if (choices.length > 1) {

  AskUserToDisambiguate(choices);

}

 

 

So there's clearly use cases in which returning null allows for simpler and
less error-prone code, whereas it's not clear to me there is any use case in
which returning a copy of the transcript simplifies the code. Together,
these use cases cover all the scenarios:

 

- where there is an interpretation that contains a complex object

- where there is an interpretation that contains a string, and

- where there is no interpretation.

 

So, I continue to propose adding one additional sentence.

 

    "If no interpretation is available, this attribute MUST return null."

 

If there's no disagreement, I will add this sentence to the spec on Friday.

 

(Please note, this is very different than the reasoning behind requiring the
emma attribute to never be null. "emma" is always of type "Document" and
always returns a valid emma document, not simply a copy of some other
attribute.  Here, "interpretation" is an attribute of type "any", so it must
always be type-checked.)

 

/Glen Shires

 

[1] http://lists.w3.org/Archives/Public/public-speech-api/2012Aug/0108.html

[2] http://lists.w3.org/Archives/Public/public-speech-api/2012Aug/0107.html

 

On Wed, Sep 12, 2012 at 6:44 AM, Deborah Dahl
<dahl@conversational-technologies.com> wrote:

I would still prefer that the interpretation slot always be filled, at least
by the transcript if there’s nothing better. I think that the use case I
described in 

http://lists.w3.org/Archives/Public/public-speech-api/2012Aug/0108.html is
going to be pretty common and in that case being able to rely on something
other than null being in the interpretation field is very convenient. On the
other hand, if the application really depends on the availability of a more
complex interpretation object, the developer is going to have to make sure
that a specific speech service that can provide that kind of interpretation
is used. In that case, I don’t see how there can be a transcript without an
interpretation. 

On a related topic, I think we should also include some of the points that
Bjorn made about support for grammars and semantic tagging as discussed in
this thread --
http://lists.w3.org/Archives/Public/public-speech-api/2012Aug/0071.html.

 

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Tuesday, September 11, 2012 8:33 PM
To: Deborah Dahl; Jim Barnett; Hans Wennborg; Satish S; Bjorn Bringert;
public-speech-api@w3.org


Subject: Re: SpeechRecognitionAlternative.interpretation when interpretation
can't be provided

 

The current definition of interpretation in the spec is:

 

    "The interpretation represents the semantic meaning from what the user
said. This might be determined, for instance, through the SISR specification
of semantics in a grammar."

 

I propose adding an additional sentence at the end.

 

    "If no interpretation is available, this attribute MUST return null."

 

My reasoning (based on this lengthy thread):

*	If an SISR / etc interpretation is available, the UA must return it.
*	If an alternative string interpretation is available, such as a
normalization, the UA may return it.
*	If there's no more information available than in the transcript,
then "null" provides a very simple way for the author to check for this
condition. The author avoids a clumsy conditional (typeof(interpretation) !=
"string") and the author can easily distinguish between the case when the
interpretation returns a normalization string as opposed to if it had just
copied the transcript verbatim.
*	"null" is more commonly used than "undefined" in these
circumstances.

If there's no disagreement, I will add this sentence to the spec on
Thursday.

/Glen Shires

 

 

On Tue, Sep 4, 2012 at 11:04 AM, Glen Shires <gshires@google.com> wrote:

I've updated the spec with this change (moved interpretation and emma
attributes to SpeechRecognitionEvent):

https://dvcs.w3.org/hg/speech-api/rev/48a58e558fcc

 

As always, the current draft spec is at:

http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

 

/Glen Shires

 

On Thu, Aug 30, 2012 at 10:07 AM, Deborah Dahl
<dahl@conversational-technologies.com> wrote:

Thanks for the clarification, that makes sense.  When each new version of
the emma document arrives in a  SpeechRecognitionEvent, the author can just
repopulate all the  earlier form fields, as well as the newest one, with the
data from the most recent emma version. 

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Thursday, August 30, 2012 12:45 PM


To: Deborah Dahl
Cc: Jim Barnett; Hans Wennborg; Satish S; Bjorn Bringert;
public-speech-api@w3.org
Subject: Re: SpeechRecognitionAlternative.interpretation when interpretation
can't be provided

 

Debbie,

In my proposal, the single emma document is updated with each new
SpeechRecognitionEvent. Therefore, in continuous = true mode, the emma
document is populated in "real time" as the user speaks each field, without
waiting for the user to finish speaking. A JavaScript author could use this
to populate a form in "real time".

 

 

Also, I now realize that the SpeechRecognitionEvent.transcript is not useful
in continuous = false mode because only one final result is returned, and
thus SpeechRecognitionEvent.results[0].transcript always contains the same
string (no concatenation needed).  I also don't see it as very useful in
continuous = true mode because if an author is using this mode, it's
presumably because he wants to show continuous final results (and perhaps
interim as well). Since the author is already writing code to concatenate
results to display them "real-time", there's little or no savings with this
new attribute.  So I now retract that portion of my proposal.

 

So to clarify, here's my proposed changes to the spec. If there's no
disagreement by the end of the week I'll add it to the spec...

 

 

Delete SpeechRecognitionAlternative.interpretation

 

Delete SpeechRecognitionResult.emma

 

Add interpretation and emma attributes to SpeechRecognitionEvent.
Specifically:

 

    interface SpeechRecognitionEvent : Event {

        readonly attribute short resultIndex;

        readonly attribute SpeechRecognitionResultList results;

        readonly attribute any interpretation;

        readonly attribute Document emma;

    };

 

I do not propose to change the definitions of interpretation and emma at
this time (because there is on-going discussion), but rather to simply move
their current definitions to the new heading: "5.1.8 Speech Recognition
Event".

 

/Glen Shires

 

 

On Thu, Aug 30, 2012 at 8:36 AM, Deborah Dahl
<dahl@conversational-technologies.com> wrote:

Hi Glenn,

I agree that a single cumulative emma document is preferable to multiple
emma documents in general, although I think that there might be use cases
where it would be convenient to have both.  For example, you want to
populate a form in real time as the user speaks each field, without waiting
for the user to finish speaking. After the result is final the application
could send the cumulative result to the server, but seeing the interim
results would be helpful feedback to the user.

Debbie

From: Glen Shires [mailto:gshires@google.com] 
Sent: Wednesday, August 29, 2012 2:57 PM
To: Deborah Dahl
Cc: Jim Barnett; Hans Wennborg; Satish S; Bjorn Bringert;
public-speech-api@w3.org


Subject: Re: SpeechRecognitionAlternative.interpretation when interpretation
can't be provided

 

I believe the same is true for emma, a single, cumulative emma document is
preferable to multiple emma documents. 

 

I propose the following changes to the spec:

 

Delete SpeechRecognitionAlternative.interpretation

 

Delete SpeechRecognitionResult.emma

 

Add interpretation and emma attributes to SpeechRecognitionEvent.
Specifically:

 

    interface SpeechRecognitionEvent : Event {

        readonly attribute short resultIndex;

        readonly attribute SpeechRecognitionResultList results;

        readonly attribute DOMString transcript;

        readonly attribute any interpretation;

        readonly attribute Document emma;

    };

 

I do not propose to change the definitions of interpretation and emma at
this time (because there is on-going discussion), but rather to simply move
their current definitions to the new heading: "5.1.8 Speech Recognition
Event".

 

I also propose adding transcript attribute to SpeechRecognitionEvent (but
also retaining SpeechRecognitionAlternative.transcript). This provides a
simple option for JavaScript authors to get at the full, cumulative
transcript.  I propose the definition under "5.1.8 Speech Recognition Event"
be:

 

transcript

The transcript string represents the raw words that the user spoke. This is
a concatenation of the first (highest confidence) alternative of all final
SpeechRecognitionAlternative.transcript strings.

 

/Glen Shires 

 

 

On Wed, Aug 29, 2012 at 10:30 AM, Deborah Dahl
<dahl@conversational-technologies.com> wrote:

I agree with having a single interpretation that represents the cumulative
interpretation of the utterance so far. 

I think an example of what Jim is talking about, when the interpretation
wouldn’t be final even if the transcript is, might be the utterance “from
Chicago … Midway”. Maybe the grammar has a default of “Chicago O’Hare”, and
returns “from: ORD”, because most people don’t bother to say “O’Hare”, but
then it hears “Midway” and changes the interpretation to “from: MDW”.
However, “from Chicago” is still the transcript. 

Also the problem that Glenn points out is bad enough with two slots, but it
gets even worse as the number of slots gets bigger. For example, you might
have a pizza-ordering utterance with five or six ingredients (“I want a
large pizza with mushrooms…pepperoni…onions…olives…anchovies”). It would be
very cumbersome to have to go back through all the results to fill in the
slots separately.

 

From: Jim Barnett [mailto:Jim.Barnett@genesyslab.com] 
Sent: Wednesday, August 29, 2012 12:37 PM
To: Glen Shires; Deborah Dahl


Cc: Hans Wennborg; Satish S; Bjorn Bringert; public-speech-api@w3.org

Subject: RE: SpeechRecognitionAlternative.interpretation when interpretation
can't be provided

 

I agree with the idea of having a single interpretation.  There is no
guarantee that the different parts of the string have independent
interpretations.  For example, even if the transcription “from New York” is
final,  its interpretation may not  be, since it may depend on the remaining
parts of the utterance (that depends on how complicated the grammar is, of
course.)  

 

-          Jim

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Wednesday, August 29, 2012 11:44 AM
To: Deborah Dahl
Cc: Hans Wennborg; Satish S; Bjorn Bringert; public-speech-api@w3.org
Subject: Re: SpeechRecognitionAlternative.interpretation when interpretation
can't be provided

 

How should interpretation work with continuous speech?

 

Specifically, as each portion becomes final (each SpeechRecognitionResult
with final=true), the corresponding alternative(s) for transcription and
interpretation become final.

 

It's easy for the JavaScript author to handle the consecutive list of
transcription strings - simply concatenate them.

 

However, if the interpretation returns a semantic structure (such as the
depart/arrive example), it's unclear to me how they should be returned.  For
example, if the first final result was "from New York" and the second "to
San Francisco", then:

 

After the first final result, the list is:

 

event.results[0].item[0].transcription = "from New York"

event.results[0].item[0].interpretation = {

  depart: "New York",

  arrive: null

};

 

After the second final result, the list is:

 

event.results[0].item[0].transcription = "from New York"

event.results[0].item[0].interpretation = {

  depart: "New York",

  arrive: null

};

 

event.results[1].item[0].transcription = "to San Francisco"

event.results[1].item[0].interpretation = {

  depart: null,

  arrive: "San Francisco"

};

 

If so, this makes using the interpretation structure very messy for the
author because he needs to loop through all the results to find each
interpretation slot that he needs.

 

I suggest that we instead consider changing the spec to provide a single
interpretation that always represents the most current interpretation.

 

After the first final result, the list is:

 

event.results[0].item[0].transcription = "from New York"

event.interpretation = {

  depart: "New York",

  arrive: null

};

 

After the second final result, the list is:

 

event.results[0].item[0].transcription = "from New York"

event.results[1].item[0].transcription = "to San Francisco"

event.interpretation = {

  depart: "New York",

  arrive: "San Francisco"

};

 

This not only makes it simple for the author to process the interpretation,
it also solves the problem that the interpretation may not be available at
the same point in time that the transcription becomes final.  If alternative
interpretations are important, then it's easy to add them to the
interpretation structure that is returned, and this format far easier for
the author to process than multiple
SpeechRecognitionAlternative.interpretations.  For example:

 

event.interpretation = {

  depart: ["New York", "Newark"],

  arrive: ["San Francisco", "San Bernardino"],

};

 

/Glen Shires

 

On Wed, Aug 29, 2012 at 7:07 AM, Deborah Dahl
<dahl@conversational-technologies.com> wrote:

I don’t think there’s a big difference in complexity in this use case, but
here’s another one, that I think might be more common.

Suppose the application is something like search or composing email, and the
transcript alone would serve the application's purposes. However, some
implementations might also provide useful normalizations like converting
text numbers to digits or capitalization that would make the dictated text
look more like written language, and this normalization fills the
"interpretation slot". If the developer can count on the "interpretation"
slot being filled by the transcript if there's nothing better, then the
developer only has to ask for the interpretation. 

e.g. 

document.write(interpretation)

 

vs. 

if(intepretation)

                document.write(interpretation)

else

                document.write(transcript)

 

which I think is simpler. The developer doesn’t have to worry about type
checking because in this application the “interpretation” will always be a
string.

From: Glen Shires [mailto:gshires@google.com] 
Sent: Tuesday, August 28, 2012 10:44 PM
To: Deborah Dahl


Cc: Hans Wennborg; Satish S; Bjorn Bringert; public-speech-api@w3.org
Subject: Re: SpeechRecognitionAlternative.interpretation when interpretation
can't be provided

 

Debbie,

Looking at this from the viewpoint of what is easier for the JavaScript
author, I believe:

 

SpeechRecognitionAlternative.transcript must return a string (even if an
empty string). Thus, an author wishing to use the transcript doesn't need to
perform any type checking.

 

SpeechRecognitionAlternative.interpretation must be null if no
interpretation is provided.  This simplifies the required conditional by
eliminating type checking.  For example:

 

transcript = "from New York to San Francisco";

 

interpretation = {

  depart: "New York",

  arrive: "San Francisco"

};

 

if (interpretation)  // this works if interpretation is present or if null

  document.write("Depart " + interpretation.depart + " and arrive in " +
interpretation.arrive);

else

  document.write(transcript);

fi

 

 

Whereas, if the interpretation contains the transcript string when no
interpretation is present, the condition would have to be:

 

if (typeof(interpretation) != "string")

 

Which is more complex, and more prone to errors (e.g. if spell "string"
wrong).

 

/Glen Shires

 

 

On Thu, Aug 23, 2012 at 6:37 AM, Deborah Dahl
<dahl@conversational-technologies.com> wrote:

Hi Glenn,

In the case of an SLM, if there’s a classification, I think the
classification would be the interpretation. If the SLM is just used to
improve dictation results, without classification, then the interpretation
would be whatever we say it is – either the transcript, null, or undefined. 

My point about stating that the “transcript” attribute is required or
optional wasn’t whether or not there was a use case where it would be
desirable not to return a transcript. My point was that the spec needs to be
explicit about the optional/required status of every feature. It’s fine to
postpone that decision if there’s any controversy, but if we all agree we
might as well add it to the spec. 

I can’t think of any cases where it would be bad to return a transcript,
although I can think of use cases where the developer wouldn’t choose to do
anything with the transcript (like multi-slot form filling – all the end
user really needs to see is the correctly filled slots). 

Debbie

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Thursday, August 23, 2012 3:48 AM
To: Deborah Dahl
Cc: Hans Wennborg; Satish S; Bjorn Bringert; public-speech-api@w3.org


Subject: Re: SpeechRecognitionAlternative.interpretation when interpretation
can't be provided

 

Debbie,

I agree with the need to support SLMs. This implies that, in some cases, the
author may not specify semantic information, and thus there would not be an
interpretation.

 

Under what circumstances (except error conditions) do you envision that a
transcript would not be returned?

 

/Glen Shires

 

On Wed, Aug 22, 2012 at 6:08 AM, Deborah Dahl
<dahl@conversational-technologies.com> wrote:

Actually, Satish's comment made me think that we probably have a few other
things to agree on before we decide what the default value of
"interpretation" should be, because we haven't settled on a lot of issues
about what is required and what is optional.
Satish's argument is only relevant if we require SRGS/SISR for grammars and
semantic interpretation, but we actually don't require either of those right
now, so it doesn't matter what they do as far as the current spec goes.
(Although it's worth noting that  SRGS doesn't require anything to be
returned at all, even the transcript
http://www.w3.org/TR/speech-grammar/#S1.10).
So I think we first need to decide and explicitly state in the spec ---

1. what we want to say about grammar formats (which are allowed/required, or
is the grammar format open). It probably needs to be somewhat open because
of SLM's.
2. what we want to say about semantic tag formats (are proprietary formats
allowed, is SISR required or is the semantic tag format just whatever the
grammar format uses)
3. is "transcript" required?
4. is "interpretation" required?

Debbie


> -----Original Message-----
> From: Hans Wennborg [mailto:hwennborg@google.com]
> Sent: Tuesday, August 21, 2012 12:50 PM
> To: Glen Shires
> Cc: Satish S; Deborah Dahl; Bjorn Bringert; public-speech-api@w3.org
> Subject: Re: SpeechRecognitionAlternative.interpretation when
> interpretation can't be provided
>
> Björn, Deborah, are you ok with this as well? I.e. that the spec
> shouldn't mandate a "default" value for the interpretation attribute,
> but rather return null when there is no interpretation?
>
> On Fri, Aug 17, 2012 at 6:32 PM, Glen Shires <gshires@google.com> wrote:
> > I agree, return "null" (not "undefined") in such cases.
> >
> >
> > On Fri, Aug 17, 2012 at 7:41 AM, Satish S <satish@google.com> wrote:
> >>
> >> > I may have missed something, but I don’t see in the spec where it
says
> >> > that “interpretation” is optional.
> >>
> >> Developers specify the interpretation value with SISR and if they don't
> >> specify there is no 'default' interpretation available. In that sense
it is
> >> optional because grammars don't mandate it. So I think this API
shouldn't
> >> mandate providing a default value if the engine did not provide one,
and
> >> return null in such cases.



> >>
> >> Cheers
> >> Satish
> >>
> >>
> >>
> >> On Fri, Aug 17, 2012 at 1:57 PM, Deborah Dahl
> >> <dahl@conversational-technologies.com> wrote:
> >>>
> >>> I may have missed something, but I don’t see in the spec where it says
> >>> that “interpretation” is optional.
> >>>
> >>> From: Satish S [mailto:satish@google.com]
> >>> Sent: Thursday, August 16, 2012 7:38 PM
> >>> To: Deborah Dahl
> >>> Cc: Bjorn Bringert; Hans Wennborg; public-speech-api@w3.org
> >>>
> >>>
> >>> Subject: Re: SpeechRecognitionAlternative.interpretation when
> >>> interpretation can't be provided
> >>>
> >>>
> >>>
> >>> 'interpretation' is an optional attribute because engines are not
> >>> required to provide an interpretation on their own (unlike
'transcript').
> As
> >>> such I think it should return null when there isn't a value to be
returned
> >>> as that is the convention for optional attributes, not 'undefined' or
a
> copy
> >>> of some other attribute.
> >>>
> >>>
> >>>
> >>> If an engine chooses to return the same value for 'transcript' and
> >>> 'interpretation' or do textnorm of the value and return in
'interpretation'
> >>> that will be an implementation detail of the engine. But in the
absence
> of
> >>> any such value for 'interpretation' from the engine I think the UA
should
> >>> return null.
> >>>
> >>>
> >>> Cheers
> >>> Satish
> >>>
> >>> On Thu, Aug 16, 2012 at 2:52 PM, Deborah Dahl
> >>> <dahl@conversational-technologies.com> wrote:
> >>>
> >>> That's a good point. There are lots of use cases where some simple
> >>> normalization is extremely useful, as in your example, or collapsing
all
> the
> >>> ways that the user might say "yes" or "no". However, you could say
that
> once
> >>> the implementation has modified or normalized the transcript that
> means it
> >>> has some kind of interpretation, so putting a normalized value in the
> >>> interpretation slot should be fine. Nothing says that the
"interpretation"
> >>> has to be a particularly fine-grained interpretation, or one with a
lot of
> >>> structure.
> >>>
> >>>
> >>>
> >>> > -----Original Message-----
> >>> > From: Bjorn Bringert [mailto:bringert@google.com]
> >>> > Sent: Thursday, August 16, 2012 9:09 AM
> >>> > To: Hans Wennborg
> >>> > Cc: Conversational; public-speech-api@w3.org
> >>> > Subject: Re: SpeechRecognitionAlternative.interpretation when
> >>> > interpretation can't be provided
> >>> >
> >>> > I'm not sure that it has to be that strict in requiring that the
value
> >>> > is the same as the "transcript" attribute. For example, an engine
> >>> > might return the words recognized in "transcript" and apply some
> extra
> >>> > textnorm to the text that it returns in "interpretation", e.g.
> >>> > converting digit words to digits ("three" -> "3"). Not sure if
that's
> >>> > useful though.
> >>> >
> >>> > On Thu, Aug 16, 2012 at 1:58 PM, Hans Wennborg
> >>> > <hwennborg@google.com> wrote:
> >>> > > Yes, the raw text is in the 'transcript' attribute.
> >>> > >
> >>> > > The description of 'interpretation' is currently: "The
interpretation
> >>> > > represents the semantic meaning from what the user said. This
> might
> >>> > > be
> >>> > > determined, for instance, through the SISR specification of
semantics
> >>> > > in a grammar."
> >>> > >
> >>> > > I propose that we change it to "The interpretation represents the
> >>> > > semantic meaning from what the user said. This might be
> determined,
> >>> > > for instance, through the SISR specification of semantics in a
> >>> > > grammar. If no semantic meaning can be determined, the attribute
> must
> >>> > > be a string with the same value as the 'transcript' attribute."
> >>> > >
> >>> > > Does that sound good to everyone? If there are no objections, I'll
> >>> > > make the change to the draft next week.
> >>> > >
> >>> > > Thanks,
> >>> > > Hans
> >>> > >
> >>> > > On Wed, Aug 15, 2012 at 5:29 PM, Conversational
> >>> > > <dahl@conversational-technologies.com> wrote:
> >>> > >> I can't check the spec right now, but I assume there's already an
> >>> > >> attribute
> >>> > that currently is defined to contain the raw text. So I think we
could
> >>> > say that
> >>> > if there's no interpretation the value of the interpretation
attribute
> >>> > would be
> >>> > the same as the value of the "raw string" attribute,
> >>> > >>
> >>> > >> Sent from my iPhone
> >>> > >>
> >>> > >> On Aug 15, 2012, at 9:57 AM, Hans Wennborg
> <hwennborg@google.com>
> >>> > wrote:
> >>> > >>
> >>> > >>> OK, that would work I suppose.
> >>> > >>>
> >>> > >>> What would the spec text look like? Something like "[...] If no
> >>> > >>> semantic meaning can be determined, the attribute will a string
> >>> > >>> representing the raw words that the user spoke."?
> >>> > >>>
> >>> > >>> On Wed, Aug 15, 2012 at 2:24 PM, Bjorn Bringert
> >>> > <bringert@google.com> wrote:
> >>> > >>>> Yeah, that would be my preference too.
> >>> > >>>>
> >>> > >>>> On Wed, Aug 15, 2012 at 2:19 PM, Conversational
> >>> > >>>> <dahl@conversational-technologies.com> wrote:
> >>> > >>>>> If there isn't an interpretation I think it would make the
most
> >>> > >>>>> sense
> >>> > for the attribute to contain the literal string result. I believe
this
> >>> > is what
> >>> > happens in VoiceXML.
> >>> > >>>>>
> >>> > >>>>>> My question is: for implementations that cannot provide an
> >>> > >>>>>> interpretation, what should the attribute's value be? null?
> >>> > undefined?
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Bjorn Bringert
> >>> > Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
> >>> > Palace Road, London, SW1W 9TQ
> >>> > Registered in England Number: 3977902
> >>>
> >>>
> >>>
> >>
> >>
> >

 

 

 

 

 

 

 

 

 

 

Received on Thursday, 13 September 2012 17:37:09 UTC