- From: Young, Milan <Milan.Young@nuance.com>
- Date: Tue, 14 Jun 2011 18:00:42 -0700
- To: Robert Brown <Robert.Brown@microsoft.com>, "Satish Sampath (Google)" <satish@google.com>, <gshires@google.com>, "Marc Schroeder (DFKI)" <marc.schroeder@dfki.de>, "Patrick Ehlen (AT&T)" <pehlen@attinteractive.com>, "JOHNSTON, MICHAEL J (MICHAEL J)" <johnston@research.att.com>
- CC: "Dan Burnett (Voxeo)" <dburnett@voxeo.com>, HTML Speech XG <public-xg-htmlspeech@w3.org>
- Message-ID: <1AA381D92997964F898DF2A3AA4FF9AD0B9B15C8@SUN-EXCH01.nuance.com>
Glad to hear that we are converging. Follow-up comments: * Regarding cookies, I thought we might use the MRCP headers to at least transport information about the URL the page is executing within. Perhaps I've misunderstood, but giving that information to the SS doesn't seem like a security breach. Of course if Michael can figure out a way to push all the cookies, then that's even better. * Regarding NOTIFY, my intention was that the server could send this event at any time while the session is live. It wouldn't need to wait for a client request to be "in-progress". Maybe you already understood that, but your use of "in-progress" made me unsure. * I was thinking that it would be convenient to select the set of NOTIFYs at runtime (eg SET-PARAMS) rather than always at session startup. In my proposal, the "a=resource:notify" was only a instruction that the webapp was capable of dealing with the general concept of a NOTIFY rather than a particular class. But I suppose that if we can agree that the browser never filters NOTIFYs we can have it both ways. * Curious to know your thoughts on RECORD. Thanks ________________________________ From: Robert Brown [mailto:Robert.Brown@microsoft.com] Sent: Tuesday, June 14, 2011 4:33 PM To: Young, Milan; Satish Sampath (Google); gshires@google.com; Marc Schroeder (DFKI); Patrick Ehlen (AT&T); JOHNSTON, MICHAEL J (MICHAEL J) Cc: Dan Burnett (Voxeo); HTML Speech XG Subject: RE: Control portion of SS protocol Thanks Milan, this is a nice tight list. A couple of minor tweaks to make the method list consistent with MRCP2-24: (I assume 24 is the latest version?) - GET/SET-PARAMS are now listed as generic methods. - RECOGNITION-START-TIMERS has been re-named START-INPUT-TIMERS I agree on the response & event list. In addition, reco results would default to EMMA rather than NLSML I generally agree on using the same list of headers. When you said "except verification" I assume you mean those unique headers listed under the speaker verification feature? The other thing I think we should remove is the cookie headers. I recall we had a discussion on cookies at the F2F, and a number of us felt that it was inappropriate to give the service transitive use of the UA's cookies, and brainstormed an alternative mechanism. Michael Bodell volunteered make a proposal. I like the NOTIFY event. Services could send it while processing any in-progress request. We may want to introduce a mechanism for clients to only subscribe to certain events. For example, all the Microsoft TTS engines can produce viseme events (e.g. http://dict.bing.com.cn/#%3Ahome, and click on the orange TV icon), but most apps wouldn't want to receive them. This may be as simple as introducing a "subscribe" header that lists the custom events you want to receive. From: Young, Milan [mailto:Milan.Young@nuance.com] Sent: Friday, June 10, 2011 11:46 AM To: Robert Brown; Satish Sampath (Google); gshires@google.com; Marc Schroeder (DFKI); Patrick Ehlen (AT&T); JOHNSTON, MICHAEL J (MICHAEL J) Cc: Dan Burnett (Voxeo); HTML Speech XG Subject: Control portion of SS protocol Robert's draft referenced a few placeholder control methods and headers that were "inspired from MRCP". This is a start at making these sections more concrete. One notable omission is handling of continuous recognition results and corrections. I will follow up on this section early next week. --------------------------- Client Requests For the contents of 'recognition-method', I suggest we use the following as defined by MRCP v2: SET-PARAMS GET-PARAMS DEFINE-GRAMMAR RECOGNIZE RECOGNITION-START-TIMERS STOP INTERPRET ... and for 'synthesizer-method': SET-PARAMS GET-PARAMS SPEAK STOP PAUSE RESUME BARGE-IN-OCCURRED CONTROL DEFINE-LEXICON I suggest we also add a recorder resource (this probably needs discussion in the API group). Although there are other ways to pass recorded audio from client to server, doing it within the protocol has some nice advantages: * Consistent use of server-based endpointing and channel adaptation. * Shares the headers with the other control messages (eg timeouts, cookies, and channel-identifier). * Same network paths 'recorder-method' would be defined as per MRCP v2 using the following methods: RECORD STOP START-INPUT-TIMERS Server Responses Server request state should be exactly as defined by MRCP v2: COMPLETE IN-PROGRESS PENDING For 'recognizer-event', I suggest we use the following as defined by MRCP: START-OF-INPUT RECOGNITION-COMPLETE INTERPRETATION-COMPLETE ... and for 'synthesizer-event' SPEECH-MARKER SPEAK-COMPLETE ...and for 'recorder-event' START-OF-INPUT RECORD-COMPLETE Headers I suggest that we use all the headers defined by MRCP v2 except those that are specific to verification. Specifically, this means: * Generic (see http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-24#section-6.2). * Synthesizer (see http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-24#section-8.4) * Recognizer (see http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-24#section-9.4) * Recorder (see http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-24#section-10.4) Appropriate use of these headers is defined as per MRCP v2 spec in the context of a specific method or response reference by this specification. Server Notifications Within MRCP v2, the server may only send message in response to a client-driven request. Client polling via GET-PARAMS is the only option for "pushing" a message from the server to the client. It's unclear whether server push through the HTML Speech protocol and API is required functionality. These messages could, for example, be accomplished outside the specification using a separate WebSocket connection. On the other hand, frameworks like MMI hinge on the ability for the server to proactively send state updates to the client. If this is found to be convenient, then we may choose to add to our list of 'event-names' with a 'notification-event'. This new event would use a status code of '200', and a request state of 'NOTIFY'. The value of the 'Channel-Identifier' header would use a new resource type called 'notification'. For example: html-speech/1.0 92 323340 200 NOTIFY Channel-Identifier: 817@notification Content-Length: 36 Content-Type: text/xml <?xml version="1.0"?> <foo>bar</foo> A couple notes: * If the [body] was detected as being XML or JSON, it would be nice if the client browser could automatically reflect the data as a DOM or EMCA object. But I don't know much about that sort of technology, so would need someone else to comment. * The client would request notifications using the SDP-like setup protocol that Robert is working on. Something like 'a=resource:notification'. * The client browser would not interpret any headers in the notification those required to parse the message (ie 'Content-Length', 'Content-Type', and 'Content-Encoding'). * The request-id, Channel-Identifier, and other headers would be bundled up along with the body and handed to the webapp. It would be up to the application to decide the meaning of such headers in the context of the notification.
Received on Wednesday, 15 June 2011 01:01:45 UTC