- From: Bjorn Bringert <bringert@google.com>
- Date: Fri, 29 Oct 2010 09:36:48 +0100
- To: Dan Burnett <dburnett@voxeo.com>
- Cc: Olli@pettay.fi, "Raj(Openstream)" <raj@openstream.com>, Dave Burke <daveburke@google.com>, Michael Bodell <mbodell@microsoft.com>, Deborah Dahl <dahl@conversational-technologies.com>, public-xg-htmlspeech@w3.org
Sounds like a good idea. How about going a bit further: 27a. Speech recognition grammars should use standard formats such as SRGS and SISR. 27b. TTS should use standard formats such as SSML. 27c. Recognition results should be based upon a standard such as EMMA but be in an easy-to-process format such as JSON. - This puts recognition and synthesis in separate requirements, since we might end up with separate specs for them. - This drops media composition. There have been no proposed use cases that need it, and no existing standard for it has been proposed on this thread. /Bjorn On Fri, Oct 29, 2010 at 1:40 AM, Dan Burnett <dburnett@voxeo.com> wrote: > Because > a) we are operating at a requirements level currently, > b) we essentially have agreement on SRGS, SISR, and SSML, and > c) we are beginning to agree on a direction for the recognition results, > > I propose we split this requirement into two: > > 27a. Grammars, TTS, and media composition should all use standard formats > such as SRGS, SISR, and SSML. > 27b. Recognition results should be based upon a standard such as EMMA but > be in an easy-to-process format such as JSON. > > I suspect this will simplify our determination of which requirements we can > list as "Should Address". > Thoughts? Objections? > > -- dan > > > > > On Oct 27, 2010, at 1:13 PM, Olli Pettay wrote: > >> On 10/27/2010 05:09 PM, Bjorn Bringert wrote: >>> >>> What's the simplest code (e.g. in JavaScript + DOM) needed to extract >>> the text of the best utterance from any EMMA document that a >>> recognizer might return? Michael's code works for the given example, >>> but not for an arbitrary EMMA document. >> >> Actually, the code might not work, *if* I read EMMA spec correctly, >> since it uses getElementById and id attribute is not defined to be ID >> in emma:interpretation. >> (Though, that would be just a spec bug) >> >>> I understand that many apps want to do more complex things, but I >>> would like the API that we end up with to satisfy both parts of >>> "Simple things should be easy and complex things should be possible". >> >> Totally agree with this. >> >> I wonder if we could specify some *small* subset of features we need >> from EMMA and expose those as a JSON or some other JS friendly object >> in the first version of the becoming API. >> Then in the v2 support for full EMMA could be added. >> And in the mean while MMI WG could perhaps develop JSON version >> of the result format. >> >> >> I'm hoping we could come up some reasonable small and simple API as >> version 1 and then do more in the next revisions. >> Something similar what is happening with Web Notifications. >> >> -Olli >> >> >>> >>> /Bjorn >>> >>> On Wed, Oct 27, 2010 at 2:57 PM, Raj(Openstream)<raj@openstream.com> >>> wrote: >>>> >>>> From our developers' experience, they don't seem to find Javascript any >>>> simpler than using >>>> EMMA....and all of them needless to say are Web developers to being >>>> with.. >>>> >>>> Raj >>>> >>>> ----- Original Message ----- >>>> From: Dave Burke >>>> To: Michael Bodell >>>> Cc: Bjorn Bringert ; Dan Burnett ; Deborah Dahl ; >>>> public-xg-htmlspeech@w3.org >>>> Sent: Tuesday, October 26, 2010 5:48 PM >>>> Subject: Re: R27. Grammars, TTS, media composition, and recognition >>>> results >>>> should all use standard formats >>>> Seems convoluted to force developers to have to understand EMMA when we >>>> could have a simpler JavaScript object. What does EMMA buy the typical >>>> Web >>>> developer? >>>> Dave >>>> >>>> On Tue, Oct 26, 2010 at 10:43 PM, Michael Bodell<mbodell@microsoft.com> >>>> wrote: >>>>> >>>>> Here's the first EMMA example from the specification: >>>>> >>>>> <emma:emma version="1.0" >>>>> xmlns:emma="http://www.w3.org/2003/04/emma" >>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>>>> xsi:schemaLocation="http://www.w3.org/2003/04/emma >>>>> http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" >>>>> xmlns="http://www.example.com/example"> >>>>> <emma:one-of id="r1" emma:start="1087995961542" >>>>> emma:end="1087995963542" >>>>> emma:medium="acoustic" emma:mode="voice"> >>>>> <emma:interpretation id="int1" emma:confidence="0.75" >>>>> emma:tokens="flights from boston to denver"> >>>>> <origin>Boston</origin> >>>>> <destination>Denver</destination> >>>>> </emma:interpretation> >>>>> >>>>> <emma:interpretation id="int2" emma:confidence="0.68" >>>>> emma:tokens="flights from austin to denver"> >>>>> <origin>Austin</origin> >>>>> <destination>Denver</destination> >>>>> </emma:interpretation> >>>>> </emma:one-of> >>>>> </emma:emma> >>>>> >>>>> Using something like xpath it is very simple to do something like >>>>> '//interpretation[@confidence> 0.6][1]' or '//interpretation/origin'. >>>>> >>>>> Using DOM one could easily do something like getElementsById("int1") >>>>> and >>>>> inspect that element or else getElementsByName("interpretation"). >>>>> >>>>> If you had a more E4X approach you could imagine >>>>> result["one-of"].interpretation[0] would give you the first result. >>>>> >>>>> The JSON representation of content might be: >>>>> ({'one-of':{interpretation:[{origin:"Boston", destination:"Denver"}, >>>>> {origin:"Austin", destination:"Denver"}]}}). >>>>> >>>>> In addition, depending on how the recognition is defined there might be >>>>> one or more default bindings of recognition results to input elements >>>>> in >>>>> HTML such that scripting isn't needed for the "common tasks" but the >>>>> scripting is there for the more advanced tasks. >>>>> >>>>> -----Original Message----- >>>>> From: Bjorn Bringert [mailto:bringert@google.com] >>>>> Sent: Monday, October 25, 2010 5:43 AM >>>>> To: Dan Burnett >>>>> Cc: Michael Bodell; Deborah Dahl; public-xg-htmlspeech@w3.org >>>>> Subject: Re: R27. Grammars, TTS, media composition, and recognition >>>>> results should all use standard formats >>>>> >>>>> I haven't used EMMA, but it looks like it could be a bit complex for a >>>>> script to simply get the top utterance or interpretation out. Are there >>>>> any >>>>> shorthands or DOM methods for this? Any Hello World examples to show >>>>> the >>>>> basic usage? >>>>> >>>>> /Bjorn >>>>> >>>>> On Mon, Oct 25, 2010 at 1:38 PM, Dan Burnett<dburnett@voxeo.com> >>>>> wrote: >>>>>> >>>>>> +1 >>>>>> On Oct 22, 2010, at 2:57 PM, Michael Bodell wrote: >>>>>> >>>>>>> I agree that SRGS, SISR, EMMA, and SSML seems like the obvious W3C >>>>>>> standard formats that we should use. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: public-xg-htmlspeech-request@w3.org >>>>>>> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Deborah >>>>>>> Dahl >>>>>>> Sent: Friday, October 22, 2010 6:39 AM >>>>>>> To: 'Bjorn Bringert'; 'Dan Burnett' >>>>>>> Cc: public-xg-htmlspeech@w3.org >>>>>>> Subject: RE: R27. Grammars, TTS, media composition, and recognition >>>>>>> results should all use standard formats >>>>>>> >>>>>>> For recognition results, EMMA >>>>>>> http://www.w3.org/TR/2009/REC-emma-20090210/ >>>>>>> is a much more recent and more complete standard than NLSML. EMMA has >>>>>>> a very rich set of capabilities, but most of them are optional, so >>>>>>> that using it doesn't have to be complex. Quite a few recognizers >>>>>>> support it. I think one of the most valuable aspects of EMMA is that >>>>>>> as applications eventually start finding that they need more and more >>>>>>> information about the recognition result, much of that more advanced >>>>>>> information has already been worked out and standardized in EMMA. >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: public-xg-htmlspeech-request@w3.org >>>>>>>> [mailto:public-xg-htmlspeech- request@w3.org] On Behalf Of Bjorn >>>>>>>> Bringert >>>>>>>> Sent: Friday, October 22, 2010 7:01 AM >>>>>>>> To: Dan Burnett >>>>>>>> Cc: public-xg-htmlspeech@w3.org >>>>>>>> Subject: Re: R27. Grammars, TTS, media composition, and recognition >>>>>>>> results should all use standard formats >>>>>>>> >>>>>>>> For grammars, SRGS + SISR seems like the obvious choice. >>>>>>>> >>>>>>>> For TTS, SSML seems like the obvious choice. >>>>>>>> >>>>>>>> I'm not exactly what is meant by media composition here. Is it using >>>>>>>> TTS output together with other media? Is there a use case for this? >>>>>>>> And is there anything we need to specify here at all? >>>>>>>> >>>>>>>> For recognition results, there is NLSML, but as far as I can tell, >>>>>>>> that hasn't been widely adopted. Also, it seems like it could be a >>>>>>>> bit complex for web applications to process. >>>>>>>> >>>>>>>> /Bjorn >>>>>>>> >>>>>>>> On Fri, Oct 22, 2010 at 1:06 AM, Dan Burnett<dburnett@voxeo.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Group, >>>>>>>>> >>>>>>>>> This is the second of the requirements to discuss and prioritize >>>>>>>>> based our ranking approach [1]. >>>>>>>>> >>>>>>>>> This email is the beginning of a thread for questions, discussion, >>>>>>>>> and opinions regarding our first draft of Requirement 27 [2]. >>>>>>>>> >>>>>>>>> After our discussion and any modifications to the requirement, our >>>>>>>>> goal is to prioritize this requirement as either "Should Address" >>>>>>>>> or "For Future Consideration". >>>>>>>>> >>>>>>>>> -- dan >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> http://lists.w3.org/Archives/Public/public-xg- >>>>>>>> >>>>>>>> htmlspeech/2010Oct/0024.html >>>>>>>>> >>>>>>>>> [2] >>>>>>>>> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Oct/at >>>>>>>>> t >>>>>>>>> - >>>>>>>> >>>>>>>> 0001/speech.html#r27 >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Bjorn Bringert >>>>>>>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham >>>>>>>> Palace Road, London, SW1W 9TQ Registered in England Number: 3977902 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Bjorn Bringert >>>>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham >>>>> Palace >>>>> Road, London, SW1W 9TQ Registered in England Number: 3977902 >>>>> >>>> >>>> >>> >>> >>> >> > > -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Friday, 29 October 2010 08:37:19 UTC