- From: Robert Brown <Robert.Brown@microsoft.com>
- Date: Tue, 2 Aug 2011 23:22:31 +0000
- To: HTML Speech XG <public-xg-htmlspeech@w3.org>
- Message-ID: <113BCF28740AF44989BE7D3F84AE18DD1B1B8CBB@TK5EX14MBXC112.redmond.corp.microsoft.>
Here's the 4th draft, incorporating most of the feedback and addressing many of the open issues in the 3rd draft and the requirements doc. There are still a number of open questions, but we're getting closer. I'll compile a list of open issues for us to work through and send these out in another mail. Here's the change list for the 4th draft: Changes Since Draft 3 In addition to minor overall editing to aid readability, the following changes were incorporated in response to feedback on the third draft: 2. Definitions Clarified synthesizer description. 3.1 Session Establishment Clarified that service parameters may be specified in the query string, but may be overridden using messages in the html-speech/1.0 websockets protocol once the websockets session has been established. Clarified that advanced scenarios involving multiple engines of the same resource type, or using the same input audio stream for consumption by different types of vendor-specific resources, are out of scope. 3.2 Signaling Changed the request-ID definition to match SRGS: 1-10 decimal digits. 3.3 Media Transmission Removed "skip" message. Added "start of stream" message, which removes the purpose of the START-MEDIA-STREAM request on the Recognizer (thus removing an area of confusion from section 5). Removed Request-ID from the header, replacing it with Stream-ID, also to remove some of the confusion in section 5. Clarified multiplexing. Generalized from "audio" to "media" and added some text about supported media formats. Simplified the header to just be an 8-bit message type and 24-bit stream-ID. 4.1 Getting and Setting Parameters Rewrote the capability query headers to make them more flexible (and in theory less unwieldy if more capabilities are added in the future). Added a header for subscribing to interim events. 4.3 Requestless Notifications Deleted this section. 4.3 Resource Selection Added this section do explain how resources are selected based on language and other characteristics. 5. Recognition Clarified that grammar/rule state can only change when the recognizer is idle. Corrected a number of errors in the state diagram. 5.1 Recognition Requests Removed START-MEDIA-STREAM. Added GET-GRAMMARS (and changed SET-GRAMMAR to SET-GRAMMARS). Added METADATA. 5.2 Recognition Events Change START/END-OF-INPUT to START/END-OF-SPEECH. 5.3 Recognition Headers Changed grammar-activate/grammar-deactivate to active-grammars/inactive-grammars 5.4 Recording and Re-Recognizing Added this section, which also includes re-recognition. 5.5 Predefined Grammars Was previously numbered 5.4. Clarified that the specific set of grammars is TBD later, and is optional. 5.6 Recognition Examples Was previously numbered 5.5. Corrected the existing one-shot example to match the changes. Added a continuous reco example. 6. Synthesis Clarified that SSML and plain text MUST be supported, and other input formats are permitted. 6.3 Synthesis Headers Tried to more specific about how the clock works. Added a Stream-ID header to associate a SPEAK request with an output stream. 6.4 Synthesis Examples Cleaned up the examples
Attachments
- text/html attachment: speech-protocol-draft-04.html
Received on Tuesday, 2 August 2011 23:23:27 UTC