RE: Protocol Draft 4

Thanks Milan, comments inserted below.
(yes we're having a call tomorrow)

Cheers,
/Rob


From: Young, Milan [mailto:Milan.Young@nuance.com]
Sent: Wednesday, August 03, 2011 12:18 PM
To: Robert Brown; HTML Speech XG
Subject: RE: Protocol Draft 4

Hello Robert,

Section 4.1 - MRCP v2 states that parameters set in requests take precedence over those set in SET-PARAMS (see section 6.1.1).  Perhaps that's worth restating here as a courtesy to readers.

RB> Fair enough. There's already the sentence "Individual requests may set different values that apply only to that request", but it's easy enough to include more specific wording.

Section 4.3 - typo 'paramaters' -> 'parameters'
RB> thanks. I guess I should use a text editor that has a spell checker :$

Section 5

-          I'm assuming that SET-PARAMS and GET-PARAMS are also available in the initial idle state.  If this was implicit in your diagram, might want to make that more obvious.
RB> Agreed

-          What do you think about adding a header to LISTEN (and INTERPRET) that serves as an implicit SET-GRAMMAR?  This would avoid the extra roundtrip that the SET-GRAMMAR adds over traditional MRCP while still retaining runtime flexibility.  I suspect this is a common enough path that the optimization would be worthwhile.
RB> The Active-Grammars header should do the trick. Section 5.1 has this sentence under LISTEN: "A LISTEN request MAY also activate or deactivate grammars and rules using the Active-Grammars and Inactive-Grammars headers".

-          I wonder if it would be better to model INTERPRET as a blocking call that results in an INTERPRETATION-COMPLETE response.  Certainly an easier programming model.
RB> It could be implemented that way, since it's presumably a very rapid turn-around. But it's not clear to me what difference it would make in the protocol.

Section 6.3 - Do we really want to be doing all of these date/time format translations?  I suspect what the developer really wants to know is how many milliseconds from the start of the stream these events occurred.  They can always translate that back to wall clock time if they want, but I suspect that will be less common.

RB> The problem with offset from start of stream is that when there are multiple input streams the offset will be potentially different for each stream, which is why an absolute time is needed. But I do agree that we need to be more crisp about this.

I'm assuming that we're having a call this week.  Speak with you then.


From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Robert Brown
Sent: Tuesday, August 02, 2011 4:23 PM
To: HTML Speech XG
Subject: Protocol Draft 4

Here's the 4th draft, incorporating most of the feedback and addressing many of the open issues in the 3rd draft and the requirements doc.

There are still a number of open questions, but we're getting closer. I'll compile a list of open issues for us to work through and send these out in another mail.

Here's the change list for the 4th draft:

Changes Since Draft 3
In addition to minor overall editing to aid readability, the following changes were incorporated in response to feedback on the third draft:
2. Definitions
Clarified synthesizer description.
3.1 Session Establishment
Clarified that service parameters may be specified in the query string, but may be overridden using messages in the html-speech/1.0 websockets protocol once the websockets session has been established.
Clarified that advanced scenarios involving multiple engines of the same resource type, or using the same input audio stream for consumption by different types of vendor-specific resources, are out of scope.
3.2 Signaling
Changed the request-ID definition to match SRGS: 1-10 decimal digits.
3.3 Media Transmission
Removed "skip" message.
Added "start of stream" message, which removes the purpose of the START-MEDIA-STREAM request on the Recognizer (thus removing an area of confusion from section 5).
Removed Request-ID from the header, replacing it with Stream-ID, also to remove some of the confusion in section 5.
Clarified multiplexing.
Generalized from "audio" to "media" and added some text about supported media formats.
Simplified the header to just be an 8-bit message type and 24-bit stream-ID.
4.1 Getting and Setting Parameters
Rewrote the capability query headers to make them more flexible (and in theory less unwieldy if more capabilities are added in the future).
Added a header for subscribing to interim events.
4.3 Requestless Notifications
Deleted this section.
4.3 Resource Selection
Added this section do explain how resources are selected based on language and other characteristics.
5. Recognition
Clarified that grammar/rule state can only change when the recognizer is idle.
Corrected a number of errors in the state diagram.
5.1 Recognition Requests
Removed START-MEDIA-STREAM.
Added GET-GRAMMARS (and changed SET-GRAMMAR to SET-GRAMMARS).
Added METADATA.
5.2 Recognition Events
Change START/END-OF-INPUT to START/END-OF-SPEECH.
5.3 Recognition Headers
Changed grammar-activate/grammar-deactivate to active-grammars/inactive-grammars
5.4 Recording and Re-Recognizing
Added this section, which also includes re-recognition.
5.5 Predefined Grammars
Was previously numbered 5.4.
Clarified that the specific set of grammars is TBD later, and is optional.
5.6 Recognition Examples
Was previously numbered 5.5.
Corrected the existing one-shot example to match the changes.
Added a continuous reco example.
6. Synthesis
Clarified that SSML and plain text MUST be supported, and other input formats are permitted.
6.3 Synthesis Headers
Tried to more specific about how the clock works.
Added a Stream-ID header to associate a SPEAK request with an output stream.
6.4 Synthesis Examples
Cleaned up the examples

Received on Thursday, 4 August 2011 00:56:45 UTC