- From: Young, Milan <Milan.Young@nuance.com>
- Date: Wed, 3 Aug 2011 12:18:20 -0700
- To: Robert Brown <Robert.Brown@microsoft.com>, HTML Speech XG <public-xg-htmlspeech@w3.org>
- Message-ID: <1AA381D92997964F898DF2A3AA4FF9AD0C4F84EA@SUN-EXCH01.nuance.com>
Hello Robert, Section 4.1 - MRCP v2 states that parameters set in requests take precedence over those set in SET-PARAMS (see section 6.1.1). Perhaps that's worth restating here as a courtesy to readers. Section 4.3 - typo 'paramaters' -> 'parameters' Section 5 - I'm assuming that SET-PARAMS and GET-PARAMS are also available in the initial idle state. If this was implicit in your diagram, might want to make that more obvious. - What do you think about adding a header to LISTEN (and INTERPRET) that serves as an implicit SET-GRAMMAR? This would avoid the extra roundtrip that the SET-GRAMMAR adds over traditional MRCP while still retaining runtime flexibility. I suspect this is a common enough path that the optimization would be worthwhile. - I wonder if it would be better to model INTERPRET as a blocking call that results in an INTERPRETATION-COMPLETE response. Certainly an easier programming model. Section 6.3 - Do we really want to be doing all of these date/time format translations? I suspect what the developer really wants to know is how many milliseconds from the start of the stream these events occurred. They can always translate that back to wall clock time if they want, but I suspect that will be less common. I'm assuming that we're having a call this week. Speak with you then. From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Robert Brown Sent: Tuesday, August 02, 2011 4:23 PM To: HTML Speech XG Subject: Protocol Draft 4 Here's the 4th draft, incorporating most of the feedback and addressing many of the open issues in the 3rd draft and the requirements doc. There are still a number of open questions, but we're getting closer. I'll compile a list of open issues for us to work through and send these out in another mail. Here's the change list for the 4th draft: Changes Since Draft 3 In addition to minor overall editing to aid readability, the following changes were incorporated in response to feedback on the third draft: 2. Definitions Clarified synthesizer description. 3.1 Session Establishment Clarified that service parameters may be specified in the query string, but may be overridden using messages in the html-speech/1.0 websockets protocol once the websockets session has been established. Clarified that advanced scenarios involving multiple engines of the same resource type, or using the same input audio stream for consumption by different types of vendor-specific resources, are out of scope. 3.2 Signaling Changed the request-ID definition to match SRGS: 1-10 decimal digits. 3.3 Media Transmission Removed "skip" message. Added "start of stream" message, which removes the purpose of the START-MEDIA-STREAM request on the Recognizer (thus removing an area of confusion from section 5). Removed Request-ID from the header, replacing it with Stream-ID, also to remove some of the confusion in section 5. Clarified multiplexing. Generalized from "audio" to "media" and added some text about supported media formats. Simplified the header to just be an 8-bit message type and 24-bit stream-ID. 4.1 Getting and Setting Parameters Rewrote the capability query headers to make them more flexible (and in theory less unwieldy if more capabilities are added in the future). Added a header for subscribing to interim events. 4.3 Requestless Notifications Deleted this section. 4.3 Resource Selection Added this section do explain how resources are selected based on language and other characteristics. 5. Recognition Clarified that grammar/rule state can only change when the recognizer is idle. Corrected a number of errors in the state diagram. 5.1 Recognition Requests Removed START-MEDIA-STREAM. Added GET-GRAMMARS (and changed SET-GRAMMAR to SET-GRAMMARS). Added METADATA. 5.2 Recognition Events Change START/END-OF-INPUT to START/END-OF-SPEECH. 5.3 Recognition Headers Changed grammar-activate/grammar-deactivate to active-grammars/inactive-grammars 5.4 Recording and Re-Recognizing Added this section, which also includes re-recognition. 5.5 Predefined Grammars Was previously numbered 5.4. Clarified that the specific set of grammars is TBD later, and is optional. 5.6 Recognition Examples Was previously numbered 5.5. Corrected the existing one-shot example to match the changes. Added a continuous reco example. 6. Synthesis Clarified that SSML and plain text MUST be supported, and other input formats are permitted. 6.3 Synthesis Headers Tried to more specific about how the clock works. Added a Stream-ID header to associate a SPEAK request with an output stream. 6.4 Synthesis Examples Cleaned up the examples
Received on Wednesday, 3 August 2011 19:18:59 UTC