Protocol Draft 4

Here's the 4th draft, incorporating most of the feedback and addressing many of the open issues in the 3rd draft and the requirements doc.

There are still a number of open questions, but we're getting closer. I'll compile a list of open issues for us to work through and send these out in another mail.

Here's the change list for the 4th draft:

Changes Since Draft 3
In addition to minor overall editing to aid readability, the following changes were incorporated in response to feedback on the third draft:
2. Definitions
Clarified synthesizer description.
3.1 Session Establishment
Clarified that service parameters may be specified in the query string, but may be overridden using messages in the html-speech/1.0 websockets protocol once the websockets session has been established.
Clarified that advanced scenarios involving multiple engines of the same resource type, or using the same input audio stream for consumption by different types of vendor-specific resources, are out of scope.
3.2 Signaling
Changed the request-ID definition to match SRGS: 1-10 decimal digits.
3.3 Media Transmission
Removed "skip" message.
Added "start of stream" message, which removes the purpose of the START-MEDIA-STREAM request on the Recognizer (thus removing an area of confusion from section 5).
Removed Request-ID from the header, replacing it with Stream-ID, also to remove some of the confusion in section 5.
Clarified multiplexing.
Generalized from "audio" to "media" and added some text about supported media formats.
Simplified the header to just be an 8-bit message type and 24-bit stream-ID.
4.1 Getting and Setting Parameters
Rewrote the capability query headers to make them more flexible (and in theory less unwieldy if more capabilities are added in the future).
Added a header for subscribing to interim events.
4.3 Requestless Notifications
Deleted this section.
4.3 Resource Selection
Added this section do explain how resources are selected based on language and other characteristics.
5. Recognition
Clarified that grammar/rule state can only change when the recognizer is idle.
Corrected a number of errors in the state diagram.
5.1 Recognition Requests
Removed START-MEDIA-STREAM.
Added GET-GRAMMARS (and changed SET-GRAMMAR to SET-GRAMMARS).
Added METADATA.
5.2 Recognition Events
Change START/END-OF-INPUT to START/END-OF-SPEECH.
5.3 Recognition Headers
Changed grammar-activate/grammar-deactivate to active-grammars/inactive-grammars
5.4 Recording and Re-Recognizing
Added this section, which also includes re-recognition.
5.5 Predefined Grammars
Was previously numbered 5.4.
Clarified that the specific set of grammars is TBD later, and is optional.
5.6 Recognition Examples
Was previously numbered 5.5.
Corrected the existing one-shot example to match the changes.
Added a continuous reco example.
6. Synthesis
Clarified that SSML and plain text MUST be supported, and other input formats are permitted.
6.3 Synthesis Headers
Tried to more specific about how the clock works.
Added a Stream-ID header to associate a SPEAK request with an output stream.
6.4 Synthesis Examples
Cleaned up the examples

Received on Tuesday, 2 August 2011 23:23:27 UTC