Re: R27. Grammars, TTS, media composition, and recognition results should all use standard formats from Raj\(Openstream\) on 2010-10-27 (public-xg-htmlspeech@w3.org from October 2010)

From: Raj\(Openstream\) <raj@openstream.com>
Date: Wed, 27 Oct 2010 10:14:28 -0400
To: "Dave Burke" <daveburke@google.com>
Cc: "Michael Bodell" <mbodell@microsoft.com>, "Bjorn Bringert" <bringert@google.com>, "Dan Burnett" <dburnett@voxeo.com>, "Deborah Dahl" <dahl@conversational-technologies.com>, <public-xg-htmlspeech@w3.org>
Message-ID: <DC7C104943444A359FADB3CFC532C402@orion>
I tend to agree on XML processing latencies, if the mobile device capabilites and network bandwidth 
remained as they were, but, I am yet to see for what EMMA provides for the applications, if  it is any more
complex than what a Javascript solution would provide for similar purpose. 

Raj
  ----- Original Message ----- 
  From: Dave Burke 
  To: Raj(Openstream) 
  Cc: Michael Bodell ; Bjorn Bringert ; Dan Burnett ; Deborah Dahl ; public-xg-htmlspeech@w3.org 
  Sent: Wednesday, October 27, 2010 10:03 AM
  Subject: Re: R27. Grammars, TTS, media composition, and recognition results should all use standard formats


  The feedback from our front-end developers is different. We care about code efficiency and compactness because we want to minimise latency (especially when running webapps over mobile networks and/or slower processors). Involving verbose XML and DOM APIs is overhead both in processing, bandwidth as well as additional code complexity. I've yet to see any argument in this discussion for why introducing XML would be a good thing for Webapp developers.


  Dave


  On Wed, Oct 27, 2010 at 2:57 PM, Raj(Openstream) <raj@openstream.com> wrote:

    From our developers'  experience, they don't seem to find Javascript any simpler than using
    EMMA....and all of them needless to say are Web developers to being with..

    Raj
      ----- Original Message ----- 
      From: Dave Burke 
      To: Michael Bodell 
      Cc: Bjorn Bringert ; Dan Burnett ; Deborah Dahl ; public-xg-htmlspeech@w3.org 
      Sent: Tuesday, October 26, 2010 5:48 PM
      Subject: Re: R27. Grammars, TTS, media composition, and recognition results should all use standard formats


      Seems convoluted to force developers to have to understand EMMA when we could have a simpler JavaScript object. What does EMMA buy the typical Web developer? 


      Dave


      On Tue, Oct 26, 2010 at 10:43 PM, Michael Bodell <mbodell@microsoft.com> wrote:

        Here's the first EMMA example from the specification:

        <emma:emma version="1.0"
           xmlns:emma="http://www.w3.org/2003/04/emma"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:schemaLocation="http://www.w3.org/2003/04/emma
            http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
           xmlns="http://www.example.com/example">
         <emma:one-of id="r1" emma:start="1087995961542" emma:end="1087995963542"
            emma:medium="acoustic" emma:mode="voice">
           <emma:interpretation id="int1" emma:confidence="0.75"
           emma:tokens="flights from boston to denver">
             <origin>Boston</origin>
             <destination>Denver</destination>
           </emma:interpretation>

           <emma:interpretation id="int2" emma:confidence="0.68"
           emma:tokens="flights from austin to denver">
             <origin>Austin</origin>
             <destination>Denver</destination>
           </emma:interpretation>
         </emma:one-of>
        </emma:emma>

        Using something like xpath it is very simple to do something like '//interpretation[@confidence > 0.6][1]' or '//interpretation/origin'.

        Using DOM one could easily do something like getElementsById("int1") and inspect that element or else getElementsByName("interpretation").

        If you had a more E4X approach you could imagine result["one-of"].interpretation[0] would give you the first result.

        The JSON representation of content might be: ({'one-of':{interpretation:[{origin:"Boston", destination:"Denver"}, {origin:"Austin", destination:"Denver"}]}}).

        In addition, depending on how the recognition is defined there might be one or more default bindings of recognition results to input elements in HTML such that scripting isn't needed for the "common tasks" but the scripting is there for the more advanced tasks.


        -----Original Message-----
        From: Bjorn Bringert [mailto:bringert@google.com]
        Sent: Monday, October 25, 2010 5:43 AM
        To: Dan Burnett

        Cc: Michael Bodell; Deborah Dahl; public-xg-htmlspeech@w3.org
        Subject: Re: R27. Grammars, TTS, media composition, and recognition results should all use standard formats

        I haven't used EMMA, but it looks like it could be a bit complex for a script to simply get the top utterance or interpretation out. Are there any shorthands or DOM methods for this? Any Hello World examples to show the basic usage?

        /Bjorn

        On Mon, Oct 25, 2010 at 1:38 PM, Dan Burnett <dburnett@voxeo.com> wrote:
        > +1
        > On Oct 22, 2010, at 2:57 PM, Michael Bodell wrote:
        >
        >> I agree that SRGS, SISR, EMMA, and SSML seems like the obvious W3C
        >> standard formats that we should use.
        >>
        >> -----Original Message-----
        >> From: public-xg-htmlspeech-request@w3.org
        >> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Deborah
        >> Dahl
        >> Sent: Friday, October 22, 2010 6:39 AM
        >> To: 'Bjorn Bringert'; 'Dan Burnett'
        >> Cc: public-xg-htmlspeech@w3.org
        >> Subject: RE: R27. Grammars, TTS, media composition, and recognition
        >> results should all use standard formats
        >>
        >> For recognition results, EMMA
        >> http://www.w3.org/TR/2009/REC-emma-20090210/
        >> is a much more recent and more complete standard than NLSML. EMMA has
        >> a very rich set of capabilities, but most of them are optional, so
        >> that using it doesn't have to be complex. Quite a few recognizers
        >> support it. I think one of the most valuable aspects of EMMA is that
        >> as applications eventually start finding that they need more and more
        >> information about the recognition result, much of that more advanced
        >> information has already been worked out and standardized in EMMA.
        >>
        >>> -----Original Message-----
        >>> From: public-xg-htmlspeech-request@w3.org
        >>> [mailto:public-xg-htmlspeech- request@w3.org] On Behalf Of Bjorn
        >>> Bringert
        >>> Sent: Friday, October 22, 2010 7:01 AM
        >>> To: Dan Burnett
        >>> Cc: public-xg-htmlspeech@w3.org
        >>> Subject: Re: R27. Grammars, TTS, media composition, and recognition
        >>> results should all use standard formats
        >>>
        >>> For grammars, SRGS + SISR seems like the obvious choice.
        >>>
        >>> For TTS, SSML seems like the obvious choice.
        >>>
        >>> I'm not exactly what is meant by media composition here. Is it using
        >>> TTS output together with other media? Is there a use case for this?
        >>> And is there anything we need to specify here at all?
        >>>
        >>> For recognition results, there is NLSML, but as far as I can tell,
        >>> that hasn't been widely adopted. Also, it seems like it could be a
        >>> bit complex for web applications to process.
        >>>
        >>> /Bjorn
        >>>
        >>> On Fri, Oct 22, 2010 at 1:06 AM, Dan Burnett <dburnett@voxeo.com> wrote:
        >>>>
        >>>> Group,
        >>>>
        >>>> This is the second of the requirements to discuss and prioritize
        >>>> based our ranking approach [1].
        >>>>
        >>>> This email is the beginning of a thread for questions, discussion,
        >>>> and opinions regarding our first draft of Requirement 27 [2].
        >>>>
        >>>> After our discussion and any modifications to the requirement, our
        >>>> goal is to prioritize this requirement as either "Should Address"
        >>>> or "For Future Consideration".
        >>>>
        >>>> -- dan
        >>>>
        >>>> [1]
        >>>> http://lists.w3.org/Archives/Public/public-xg-
        >>>
        >>> htmlspeech/2010Oct/0024.html
        >>>>
        >>>> [2]
        >>>> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Oct/at
        >>>> t
        >>>> -
        >>>
        >>> 0001/speech.html#r27
        >>>>
        >>>>
        >>>
        >>>
        >>>
        >>> --
        >>> Bjorn Bringert
        >>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
        >>> Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
        >>
        >>
        >>
        >>
        >
        >



        --
        Bjorn Bringert
        Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Wednesday, 27 October 2010 14:14:59 UTC