Re: R27. Grammars, TTS, media composition, and recognition results should all use standard formats from Dave Burke on 2010-10-26 (public-xg-htmlspeech@w3.org from October 2010)

From: Dave Burke <daveburke@google.com>
Date: Tue, 26 Oct 2010 22:48:19 +0100
To: Michael Bodell <mbodell@microsoft.com>
Cc: Bjorn Bringert <bringert@google.com>, Dan Burnett <dburnett@voxeo.com>, Deborah Dahl <dahl@conversational-technologies.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <AANLkTim7=w77J7whBfQbE0swZTogVddn8LAkuXo+PTQP@mail.gmail.com>
Seems convoluted to force developers to have to understand EMMA when we
could have a simpler JavaScript object. What does EMMA buy the typical Web
developer?

Dave

On Tue, Oct 26, 2010 at 10:43 PM, Michael Bodell <mbodell@microsoft.com>wrote:

> Here's the first EMMA example from the specification:
>
> <emma:emma version="1.0"
>    xmlns:emma="http://www.w3.org/2003/04/emma"
>    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>    xsi:schemaLocation="http://www.w3.org/2003/04/emma
>     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
>    xmlns="http://www.example.com/example">
>  <emma:one-of id="r1" emma:start="1087995961542" emma:end="1087995963542"
>     emma:medium="acoustic" emma:mode="voice">
>    <emma:interpretation id="int1" emma:confidence="0.75"
>    emma:tokens="flights from boston to denver">
>      <origin>Boston</origin>
>      <destination>Denver</destination>
>    </emma:interpretation>
>
>    <emma:interpretation id="int2" emma:confidence="0.68"
>    emma:tokens="flights from austin to denver">
>      <origin>Austin</origin>
>      <destination>Denver</destination>
>    </emma:interpretation>
>  </emma:one-of>
> </emma:emma>
>
> Using something like xpath it is very simple to do something like
> '//interpretation[@confidence > 0.6][1]' or '//interpretation/origin'.
>
> Using DOM one could easily do something like getElementsById("int1") and
> inspect that element or else getElementsByName("interpretation").
>
> If you had a more E4X approach you could imagine
> result["one-of"].interpretation[0] would give you the first result.
>
> The JSON representation of content might be:
> ({'one-of':{interpretation:[{origin:"Boston", destination:"Denver"},
> {origin:"Austin", destination:"Denver"}]}}).
>
> In addition, depending on how the recognition is defined there might be one
> or more default bindings of recognition results to input elements in HTML
> such that scripting isn't needed for the "common tasks" but the scripting is
> there for the more advanced tasks.
>
> -----Original Message-----
> From: Bjorn Bringert [mailto:bringert@google.com]
> Sent: Monday, October 25, 2010 5:43 AM
> To: Dan Burnett
> Cc: Michael Bodell; Deborah Dahl; public-xg-htmlspeech@w3.org
> Subject: Re: R27. Grammars, TTS, media composition, and recognition results
> should all use standard formats
>
> I haven't used EMMA, but it looks like it could be a bit complex for a
> script to simply get the top utterance or interpretation out. Are there any
> shorthands or DOM methods for this? Any Hello World examples to show the
> basic usage?
>
> /Bjorn
>
> On Mon, Oct 25, 2010 at 1:38 PM, Dan Burnett <dburnett@voxeo.com> wrote:
> > +1
> > On Oct 22, 2010, at 2:57 PM, Michael Bodell wrote:
> >
> >> I agree that SRGS, SISR, EMMA, and SSML seems like the obvious W3C
> >> standard formats that we should use.
> >>
> >> -----Original Message-----
> >> From: public-xg-htmlspeech-request@w3.org
> >> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Deborah
> >> Dahl
> >> Sent: Friday, October 22, 2010 6:39 AM
> >> To: 'Bjorn Bringert'; 'Dan Burnett'
> >> Cc: public-xg-htmlspeech@w3.org
> >> Subject: RE: R27. Grammars, TTS, media composition, and recognition
> >> results should all use standard formats
> >>
> >> For recognition results, EMMA
> >> http://www.w3.org/TR/2009/REC-emma-20090210/
> >> is a much more recent and more complete standard than NLSML. EMMA has
> >> a very rich set of capabilities, but most of them are optional, so
> >> that using it doesn't have to be complex. Quite a few recognizers
> >> support it. I think one of the most valuable aspects of EMMA is that
> >> as applications eventually start finding that they need more and more
> >> information about the recognition result, much of that more advanced
> >> information has already been worked out and standardized in EMMA.
> >>
> >>> -----Original Message-----
> >>> From: public-xg-htmlspeech-request@w3.org
> >>> [mailto:public-xg-htmlspeech- request@w3.org] On Behalf Of Bjorn
> >>> Bringert
> >>> Sent: Friday, October 22, 2010 7:01 AM
> >>> To: Dan Burnett
> >>> Cc: public-xg-htmlspeech@w3.org
> >>> Subject: Re: R27. Grammars, TTS, media composition, and recognition
> >>> results should all use standard formats
> >>>
> >>> For grammars, SRGS + SISR seems like the obvious choice.
> >>>
> >>> For TTS, SSML seems like the obvious choice.
> >>>
> >>> I'm not exactly what is meant by media composition here. Is it using
> >>> TTS output together with other media? Is there a use case for this?
> >>> And is there anything we need to specify here at all?
> >>>
> >>> For recognition results, there is NLSML, but as far as I can tell,
> >>> that hasn't been widely adopted. Also, it seems like it could be a
> >>> bit complex for web applications to process.
> >>>
> >>> /Bjorn
> >>>
> >>> On Fri, Oct 22, 2010 at 1:06 AM, Dan Burnett <dburnett@voxeo.com>
> wrote:
> >>>>
> >>>> Group,
> >>>>
> >>>> This is the second of the requirements to discuss and prioritize
> >>>> based our ranking approach [1].
> >>>>
> >>>> This email is the beginning of a thread for questions, discussion,
> >>>> and opinions regarding our first draft of Requirement 27 [2].
> >>>>
> >>>> After our discussion and any modifications to the requirement, our
> >>>> goal is to prioritize this requirement as either "Should Address"
> >>>> or "For Future Consideration".
> >>>>
> >>>> -- dan
> >>>>
> >>>> [1]
> >>>> http://lists.w3.org/Archives/Public/public-xg-
> >>>
> >>> htmlspeech/2010Oct/0024.html
> >>>>
> >>>> [2]
> >>>> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Oct/at
> >>>> t
> >>>> -
> >>>
> >>> 0001/speech.html#r27
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bjorn Bringert
> >>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
> >>> Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
> >>
> >>
> >>
> >>
> >
> >
>
>
>
> --
> Bjorn Bringert
> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace
> Road, London, SW1W 9TQ Registered in England Number: 3977902
>
>
Received on Tuesday, 26 October 2010 21:48:51 UTC