- From: Bjorn Bringert <bringert@google.com>
- Date: Wed, 27 Oct 2010 15:09:43 +0100
- To: "Raj(Openstream)" <raj@openstream.com>
- Cc: Dave Burke <daveburke@google.com>, Michael Bodell <mbodell@microsoft.com>, Dan Burnett <dburnett@voxeo.com>, Deborah Dahl <dahl@conversational-technologies.com>, public-xg-htmlspeech@w3.org
What's the simplest code (e.g. in JavaScript + DOM) needed to extract the text of the best utterance from any EMMA document that a recognizer might return? Michael's code works for the given example, but not for an arbitrary EMMA document. I understand that many apps want to do more complex things, but I would like the API that we end up with to satisfy both parts of "Simple things should be easy and complex things should be possible". /Bjorn On Wed, Oct 27, 2010 at 2:57 PM, Raj(Openstream) <raj@openstream.com> wrote: > From our developers' experience, they don't seem to find Javascript any > simpler than using > EMMA....and all of them needless to say are Web developers to being with.. > > Raj > > ----- Original Message ----- > From: Dave Burke > To: Michael Bodell > Cc: Bjorn Bringert ; Dan Burnett ; Deborah Dahl ; > public-xg-htmlspeech@w3.org > Sent: Tuesday, October 26, 2010 5:48 PM > Subject: Re: R27. Grammars, TTS, media composition, and recognition results > should all use standard formats > Seems convoluted to force developers to have to understand EMMA when we > could have a simpler JavaScript object. What does EMMA buy the typical Web > developer? > Dave > > On Tue, Oct 26, 2010 at 10:43 PM, Michael Bodell <mbodell@microsoft.com> > wrote: >> >> Here's the first EMMA example from the specification: >> >> <emma:emma version="1.0" >> xmlns:emma="http://www.w3.org/2003/04/emma" >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >> xsi:schemaLocation="http://www.w3.org/2003/04/emma >> http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" >> xmlns="http://www.example.com/example"> >> <emma:one-of id="r1" emma:start="1087995961542" emma:end="1087995963542" >> emma:medium="acoustic" emma:mode="voice"> >> <emma:interpretation id="int1" emma:confidence="0.75" >> emma:tokens="flights from boston to denver"> >> <origin>Boston</origin> >> <destination>Denver</destination> >> </emma:interpretation> >> >> <emma:interpretation id="int2" emma:confidence="0.68" >> emma:tokens="flights from austin to denver"> >> <origin>Austin</origin> >> <destination>Denver</destination> >> </emma:interpretation> >> </emma:one-of> >> </emma:emma> >> >> Using something like xpath it is very simple to do something like >> '//interpretation[@confidence > 0.6][1]' or '//interpretation/origin'. >> >> Using DOM one could easily do something like getElementsById("int1") and >> inspect that element or else getElementsByName("interpretation"). >> >> If you had a more E4X approach you could imagine >> result["one-of"].interpretation[0] would give you the first result. >> >> The JSON representation of content might be: >> ({'one-of':{interpretation:[{origin:"Boston", destination:"Denver"}, >> {origin:"Austin", destination:"Denver"}]}}). >> >> In addition, depending on how the recognition is defined there might be >> one or more default bindings of recognition results to input elements in >> HTML such that scripting isn't needed for the "common tasks" but the >> scripting is there for the more advanced tasks. >> >> -----Original Message----- >> From: Bjorn Bringert [mailto:bringert@google.com] >> Sent: Monday, October 25, 2010 5:43 AM >> To: Dan Burnett >> Cc: Michael Bodell; Deborah Dahl; public-xg-htmlspeech@w3.org >> Subject: Re: R27. Grammars, TTS, media composition, and recognition >> results should all use standard formats >> >> I haven't used EMMA, but it looks like it could be a bit complex for a >> script to simply get the top utterance or interpretation out. Are there any >> shorthands or DOM methods for this? Any Hello World examples to show the >> basic usage? >> >> /Bjorn >> >> On Mon, Oct 25, 2010 at 1:38 PM, Dan Burnett <dburnett@voxeo.com> wrote: >> > +1 >> > On Oct 22, 2010, at 2:57 PM, Michael Bodell wrote: >> > >> >> I agree that SRGS, SISR, EMMA, and SSML seems like the obvious W3C >> >> standard formats that we should use. >> >> >> >> -----Original Message----- >> >> From: public-xg-htmlspeech-request@w3.org >> >> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Deborah >> >> Dahl >> >> Sent: Friday, October 22, 2010 6:39 AM >> >> To: 'Bjorn Bringert'; 'Dan Burnett' >> >> Cc: public-xg-htmlspeech@w3.org >> >> Subject: RE: R27. Grammars, TTS, media composition, and recognition >> >> results should all use standard formats >> >> >> >> For recognition results, EMMA >> >> http://www.w3.org/TR/2009/REC-emma-20090210/ >> >> is a much more recent and more complete standard than NLSML. EMMA has >> >> a very rich set of capabilities, but most of them are optional, so >> >> that using it doesn't have to be complex. Quite a few recognizers >> >> support it. I think one of the most valuable aspects of EMMA is that >> >> as applications eventually start finding that they need more and more >> >> information about the recognition result, much of that more advanced >> >> information has already been worked out and standardized in EMMA. >> >> >> >>> -----Original Message----- >> >>> From: public-xg-htmlspeech-request@w3.org >> >>> [mailto:public-xg-htmlspeech- request@w3.org] On Behalf Of Bjorn >> >>> Bringert >> >>> Sent: Friday, October 22, 2010 7:01 AM >> >>> To: Dan Burnett >> >>> Cc: public-xg-htmlspeech@w3.org >> >>> Subject: Re: R27. Grammars, TTS, media composition, and recognition >> >>> results should all use standard formats >> >>> >> >>> For grammars, SRGS + SISR seems like the obvious choice. >> >>> >> >>> For TTS, SSML seems like the obvious choice. >> >>> >> >>> I'm not exactly what is meant by media composition here. Is it using >> >>> TTS output together with other media? Is there a use case for this? >> >>> And is there anything we need to specify here at all? >> >>> >> >>> For recognition results, there is NLSML, but as far as I can tell, >> >>> that hasn't been widely adopted. Also, it seems like it could be a >> >>> bit complex for web applications to process. >> >>> >> >>> /Bjorn >> >>> >> >>> On Fri, Oct 22, 2010 at 1:06 AM, Dan Burnett <dburnett@voxeo.com> >> >>> wrote: >> >>>> >> >>>> Group, >> >>>> >> >>>> This is the second of the requirements to discuss and prioritize >> >>>> based our ranking approach [1]. >> >>>> >> >>>> This email is the beginning of a thread for questions, discussion, >> >>>> and opinions regarding our first draft of Requirement 27 [2]. >> >>>> >> >>>> After our discussion and any modifications to the requirement, our >> >>>> goal is to prioritize this requirement as either "Should Address" >> >>>> or "For Future Consideration". >> >>>> >> >>>> -- dan >> >>>> >> >>>> [1] >> >>>> http://lists.w3.org/Archives/Public/public-xg- >> >>> >> >>> htmlspeech/2010Oct/0024.html >> >>>> >> >>>> [2] >> >>>> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Oct/at >> >>>> t >> >>>> - >> >>> >> >>> 0001/speech.html#r27 >> >>>> >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Bjorn Bringert >> >>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham >> >>> Palace Road, London, SW1W 9TQ Registered in England Number: 3977902 >> >> >> >> >> >> >> >> >> > >> > >> >> >> >> -- >> Bjorn Bringert >> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace >> Road, London, SW1W 9TQ Registered in England Number: 3977902 >> > > -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Wednesday, 27 October 2010 14:10:45 UTC