RE: Protocol Draft 4 from Young, Milan on 2011-08-03 (public-xg-htmlspeech@w3.org from August 2011)

From: Young, Milan <Milan.Young@nuance.com>
Date: Wed, 3 Aug 2011 12:18:20 -0700
To: Robert Brown <Robert.Brown@microsoft.com>, HTML Speech XG <public-xg-htmlspeech@w3.org>
Message-ID: <1AA381D92997964F898DF2A3AA4FF9AD0C4F84EA@SUN-EXCH01.nuance.com>
Hello Robert,

 

Section 4.1 - MRCP v2 states that parameters set in requests take
precedence over those set in SET-PARAMS (see section 6.1.1).  Perhaps
that's worth restating here as a courtesy to readers.

 

Section 4.3 - typo 'paramaters' -> 'parameters'

 

Section 5 

-        I'm assuming that SET-PARAMS and GET-PARAMS are also available
in the initial idle state.  If this was implicit in your diagram, might
want to make that more obvious.

-        What do you think about adding a header to LISTEN (and
INTERPRET) that serves as an implicit SET-GRAMMAR?  This would avoid the
extra roundtrip that the SET-GRAMMAR adds over traditional MRCP while
still retaining runtime flexibility.  I suspect this is a common enough
path that the optimization would be worthwhile.

-        I wonder if it would be better to model INTERPRET as a blocking
call that results in an INTERPRETATION-COMPLETE response.  Certainly an
easier programming model.

 

Section 6.3 - Do we really want to be doing all of these date/time
format translations?  I suspect what the developer really wants to know
is how many milliseconds from the start of the stream these events
occurred.  They can always translate that back to wall clock time if
they want, but I suspect that will be less common.

 

 

I'm assuming that we're having a call this week.  Speak with you then.

 

 

From: public-xg-htmlspeech-request@w3.org
[mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Robert Brown
Sent: Tuesday, August 02, 2011 4:23 PM
To: HTML Speech XG
Subject: Protocol Draft 4

 

Here's the 4th draft, incorporating most of the feedback and addressing
many of the open issues in the 3rd draft and the requirements doc.

 

There are still a number of open questions, but we're getting closer.
I'll compile a list of open issues for us to work through and send these
out in another mail.

 

Here's the change list for the 4th draft:

 

Changes Since Draft 3

In addition to minor overall editing to aid readability, the following
changes were incorporated in response to feedback on the third draft:

2. Definitions

Clarified synthesizer description.

3.1 Session Establishment

Clarified that service parameters may be specified in the query string,
but may be overridden using messages in the html-speech/1.0 websockets
protocol once the websockets session has been established.

Clarified that advanced scenarios involving multiple engines of the same
resource type, or using the same input audio stream for consumption by
different types of vendor-specific resources, are out of scope.

3.2 Signaling

Changed the request-ID definition to match SRGS: 1-10 decimal digits.

3.3 Media Transmission

Removed "skip" message.

Added "start of stream" message, which removes the purpose of the
START-MEDIA-STREAM request on the Recognizer (thus removing an area of
confusion from section 5).

Removed Request-ID from the header, replacing it with Stream-ID, also to
remove some of the confusion in section 5.

Clarified multiplexing.

Generalized from "audio" to "media" and added some text about supported
media formats.

Simplified the header to just be an 8-bit message type and 24-bit
stream-ID.

4.1 Getting and Setting Parameters

Rewrote the capability query headers to make them more flexible (and in
theory less unwieldy if more capabilities are added in the future).

Added a header for subscribing to interim events.

4.3 Requestless Notifications

Deleted this section.

4.3 Resource Selection

Added this section do explain how resources are selected based on
language and other characteristics.

5. Recognition

Clarified that grammar/rule state can only change when the recognizer is
idle.

Corrected a number of errors in the state diagram.

5.1 Recognition Requests

Removed START-MEDIA-STREAM.

Added GET-GRAMMARS (and changed SET-GRAMMAR to SET-GRAMMARS).

Added METADATA.

5.2 Recognition Events

Change START/END-OF-INPUT to START/END-OF-SPEECH.

5.3 Recognition Headers

Changed grammar-activate/grammar-deactivate to
active-grammars/inactive-grammars

5.4 Recording and Re-Recognizing

Added this section, which also includes re-recognition.

5.5 Predefined Grammars

Was previously numbered 5.4.

Clarified that the specific set of grammars is TBD later, and is
optional.

5.6 Recognition Examples

Was previously numbered 5.5.

Corrected the existing one-shot example to match the changes.

Added a continuous reco example.

6. Synthesis

Clarified that SSML and plain text MUST be supported, and other input
formats are permitted.

6.3 Synthesis Headers

Tried to more specific about how the clock works.

Added a Stream-ID header to associate a SPEAK request with an output
stream.

6.4 Synthesis Examples

Cleaned up the examples
Received on Wednesday, 3 August 2011 19:18:59 UTC