- From: Dan Burnett <dburnett@voxeo.com>
- Date: Thu, 28 Oct 2010 20:40:52 -0400
- To: Olli@pettay.fi
- Cc: Bjorn Bringert <bringert@google.com>, "Raj(Openstream)" <raj@openstream.com>, Dave Burke <daveburke@google.com>, Michael Bodell <mbodell@microsoft.com>, Deborah Dahl <dahl@conversational-technologies.com>, public-xg-htmlspeech@w3.org
Because a) we are operating at a requirements level currently, b) we essentially have agreement on SRGS, SISR, and SSML, and c) we are beginning to agree on a direction for the recognition results, I propose we split this requirement into two: 27a. Grammars, TTS, and media composition should all use standard formats such as SRGS, SISR, and SSML. 27b. Recognition results should be based upon a standard such as EMMA but be in an easy-to-process format such as JSON. I suspect this will simplify our determination of which requirements we can list as "Should Address". Thoughts? Objections? -- dan On Oct 27, 2010, at 1:13 PM, Olli Pettay wrote: > On 10/27/2010 05:09 PM, Bjorn Bringert wrote: >> What's the simplest code (e.g. in JavaScript + DOM) needed to extract >> the text of the best utterance from any EMMA document that a >> recognizer might return? Michael's code works for the given example, >> but not for an arbitrary EMMA document. > Actually, the code might not work, *if* I read EMMA spec correctly, > since it uses getElementById and id attribute is not defined to be ID > in emma:interpretation. > (Though, that would be just a spec bug) > >> I understand that many apps want to do more complex things, but I >> would like the API that we end up with to satisfy both parts of >> "Simple things should be easy and complex things should be possible". > Totally agree with this. > > I wonder if we could specify some *small* subset of features we need > from EMMA and expose those as a JSON or some other JS friendly object > in the first version of the becoming API. > Then in the v2 support for full EMMA could be added. > And in the mean while MMI WG could perhaps develop JSON version > of the result format. > > > I'm hoping we could come up some reasonable small and simple API as > version 1 and then do more in the next revisions. > Something similar what is happening with Web Notifications. > > -Olli > > >> >> /Bjorn >> >> On Wed, Oct 27, 2010 at 2:57 PM, >> Raj(Openstream)<raj@openstream.com> wrote: >>> From our developers' experience, they don't seem to find >>> Javascript any >>> simpler than using >>> EMMA....and all of them needless to say are Web developers to >>> being with.. >>> >>> Raj >>> >>> ----- Original Message ----- >>> From: Dave Burke >>> To: Michael Bodell >>> Cc: Bjorn Bringert ; Dan Burnett ; Deborah Dahl ; >>> public-xg-htmlspeech@w3.org >>> Sent: Tuesday, October 26, 2010 5:48 PM >>> Subject: Re: R27. Grammars, TTS, media composition, and >>> recognition results >>> should all use standard formats >>> Seems convoluted to force developers to have to understand EMMA >>> when we >>> could have a simpler JavaScript object. What does EMMA buy the >>> typical Web >>> developer? >>> Dave >>> >>> On Tue, Oct 26, 2010 at 10:43 PM, Michael Bodell<mbodell@microsoft.com >>> > >>> wrote: >>>> >>>> Here's the first EMMA example from the specification: >>>> >>>> <emma:emma version="1.0" >>>> xmlns:emma="http://www.w3.org/2003/04/emma" >>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>>> xsi:schemaLocation="http://www.w3.org/2003/04/emma >>>> http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" >>>> xmlns="http://www.example.com/example"> >>>> <emma:one-of id="r1" emma:start="1087995961542" >>>> emma:end="1087995963542" >>>> emma:medium="acoustic" emma:mode="voice"> >>>> <emma:interpretation id="int1" emma:confidence="0.75" >>>> emma:tokens="flights from boston to denver"> >>>> <origin>Boston</origin> >>>> <destination>Denver</destination> >>>> </emma:interpretation> >>>> >>>> <emma:interpretation id="int2" emma:confidence="0.68" >>>> emma:tokens="flights from austin to denver"> >>>> <origin>Austin</origin> >>>> <destination>Denver</destination> >>>> </emma:interpretation> >>>> </emma:one-of> >>>> </emma:emma> >>>> >>>> Using something like xpath it is very simple to do something like >>>> '//interpretation[@confidence> 0.6][1]' or '//interpretation/ >>>> origin'. >>>> >>>> Using DOM one could easily do something like >>>> getElementsById("int1") and >>>> inspect that element or else getElementsByName("interpretation"). >>>> >>>> If you had a more E4X approach you could imagine >>>> result["one-of"].interpretation[0] would give you the first result. >>>> >>>> The JSON representation of content might be: >>>> ({'one-of':{interpretation:[{origin:"Boston", >>>> destination:"Denver"}, >>>> {origin:"Austin", destination:"Denver"}]}}). >>>> >>>> In addition, depending on how the recognition is defined there >>>> might be >>>> one or more default bindings of recognition results to input >>>> elements in >>>> HTML such that scripting isn't needed for the "common tasks" but >>>> the >>>> scripting is there for the more advanced tasks. >>>> >>>> -----Original Message----- >>>> From: Bjorn Bringert [mailto:bringert@google.com] >>>> Sent: Monday, October 25, 2010 5:43 AM >>>> To: Dan Burnett >>>> Cc: Michael Bodell; Deborah Dahl; public-xg-htmlspeech@w3.org >>>> Subject: Re: R27. Grammars, TTS, media composition, and recognition >>>> results should all use standard formats >>>> >>>> I haven't used EMMA, but it looks like it could be a bit complex >>>> for a >>>> script to simply get the top utterance or interpretation out. Are >>>> there any >>>> shorthands or DOM methods for this? Any Hello World examples to >>>> show the >>>> basic usage? >>>> >>>> /Bjorn >>>> >>>> On Mon, Oct 25, 2010 at 1:38 PM, Dan Burnett<dburnett@voxeo.com> >>>> wrote: >>>>> +1 >>>>> On Oct 22, 2010, at 2:57 PM, Michael Bodell wrote: >>>>> >>>>>> I agree that SRGS, SISR, EMMA, and SSML seems like the obvious >>>>>> W3C >>>>>> standard formats that we should use. >>>>>> >>>>>> -----Original Message----- >>>>>> From: public-xg-htmlspeech-request@w3.org >>>>>> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Deborah >>>>>> Dahl >>>>>> Sent: Friday, October 22, 2010 6:39 AM >>>>>> To: 'Bjorn Bringert'; 'Dan Burnett' >>>>>> Cc: public-xg-htmlspeech@w3.org >>>>>> Subject: RE: R27. Grammars, TTS, media composition, and >>>>>> recognition >>>>>> results should all use standard formats >>>>>> >>>>>> For recognition results, EMMA >>>>>> http://www.w3.org/TR/2009/REC-emma-20090210/ >>>>>> is a much more recent and more complete standard than NLSML. >>>>>> EMMA has >>>>>> a very rich set of capabilities, but most of them are optional, >>>>>> so >>>>>> that using it doesn't have to be complex. Quite a few recognizers >>>>>> support it. I think one of the most valuable aspects of EMMA is >>>>>> that >>>>>> as applications eventually start finding that they need more >>>>>> and more >>>>>> information about the recognition result, much of that more >>>>>> advanced >>>>>> information has already been worked out and standardized in EMMA. >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: public-xg-htmlspeech-request@w3.org >>>>>>> [mailto:public-xg-htmlspeech- request@w3.org] On Behalf Of Bjorn >>>>>>> Bringert >>>>>>> Sent: Friday, October 22, 2010 7:01 AM >>>>>>> To: Dan Burnett >>>>>>> Cc: public-xg-htmlspeech@w3.org >>>>>>> Subject: Re: R27. Grammars, TTS, media composition, and >>>>>>> recognition >>>>>>> results should all use standard formats >>>>>>> >>>>>>> For grammars, SRGS + SISR seems like the obvious choice. >>>>>>> >>>>>>> For TTS, SSML seems like the obvious choice. >>>>>>> >>>>>>> I'm not exactly what is meant by media composition here. Is it >>>>>>> using >>>>>>> TTS output together with other media? Is there a use case for >>>>>>> this? >>>>>>> And is there anything we need to specify here at all? >>>>>>> >>>>>>> For recognition results, there is NLSML, but as far as I can >>>>>>> tell, >>>>>>> that hasn't been widely adopted. Also, it seems like it could >>>>>>> be a >>>>>>> bit complex for web applications to process. >>>>>>> >>>>>>> /Bjorn >>>>>>> >>>>>>> On Fri, Oct 22, 2010 at 1:06 AM, Dan Burnett<dburnett@voxeo.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> Group, >>>>>>>> >>>>>>>> This is the second of the requirements to discuss and >>>>>>>> prioritize >>>>>>>> based our ranking approach [1]. >>>>>>>> >>>>>>>> This email is the beginning of a thread for questions, >>>>>>>> discussion, >>>>>>>> and opinions regarding our first draft of Requirement 27 [2]. >>>>>>>> >>>>>>>> After our discussion and any modifications to the >>>>>>>> requirement, our >>>>>>>> goal is to prioritize this requirement as either "Should >>>>>>>> Address" >>>>>>>> or "For Future Consideration". >>>>>>>> >>>>>>>> -- dan >>>>>>>> >>>>>>>> [1] >>>>>>>> http://lists.w3.org/Archives/Public/public-xg- >>>>>>> >>>>>>> htmlspeech/2010Oct/0024.html >>>>>>>> >>>>>>>> [2] >>>>>>>> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Oct/at >>>>>>>> t >>>>>>>> - >>>>>>> >>>>>>> 0001/speech.html#r27 >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Bjorn Bringert >>>>>>> Google UK Limited, Registered Office: Belgrave House, 76 >>>>>>> Buckingham >>>>>>> Palace Road, London, SW1W 9TQ Registered in England Number: >>>>>>> 3977902 >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Bjorn Bringert >>>> Google UK Limited, Registered Office: Belgrave House, 76 >>>> Buckingham Palace >>>> Road, London, SW1W 9TQ Registered in England Number: 3977902 >>>> >>> >>> >> >> >> >
Received on Friday, 29 October 2010 00:41:27 UTC