RE: Protocol Draft 4 from Robert Brown on 2011-08-04 (public-xg-htmlspeech@w3.org from August 2011)

From: Robert Brown <Robert.Brown@microsoft.com>
Date: Thu, 4 Aug 2011 01:09:43 +0000
To: HTML Speech XG <public-xg-htmlspeech@w3.org>
Message-ID: <113BCF28740AF44989BE7D3F84AE18DD1B1BA289@TK5EX14MBXC112.redmond.corp.microsoft.>
Here's a list of outstanding issues that we should start working through in tomorrow's call. In addition to this, if you're in the protocol group (or interested) please make sure you read through the latest draft sometime in the next few days. I took some license filling in the gaps and incorporating feedback and we need to make sure we're okay with what's written, and identify anything that needs to be changed (thanks Milan for your quick feedback).

Thanks

Issues from the 4th draft:

1.       TODO: Add a sentence or two about the higher level motivation.

2.       TODO: add a section on security. Include authentication, encryption, transitive authorization to fetch resources.

3.       RB: I removed the ability to pass standard parameters in the query string. We didn't seem to have solid agreement on this in the calls where we reviewed the 3rd draft. Are we okay with this? If we want or need to support this, we'll need to specify a subset and provide examples.

4.       TODO: Write the rationale for why we mix media and signal in the same session. [Michael Johnston]

5.       TODO: There is an open issue to do with transitive access control. The client sends a URI to the service, which the client can access, but the service cannot, because it is not authorized to do so. How does the client grant access to the resource to the service? There are two design contenders. The first is to use the cookie technique that MRCP uses. The second is to use a virtual tag, which we discussed briefly at the F2F - Michael Bodell owes a write-up. In the absence of that write-up, perhaps the default position should be to use cookies.

6.       TODO: Specify which headers are sticky. URI request parameters aren't standardized.

7.       TODO: Specify Completion-Cause value for no input stream.

8.       TODO: Should GET-GRAMMARS also return the list of inactive grammars/rules? It's not clear how that would be useful. Also, the list of inactive rules could be rather long and unwieldy

9.       TODO: Describe how final results can be replaced in continuous recognition

10.   TODO: no match is returned, is the EMMA no-match document required?

11.   TODO: Insert some EMMA document examples.

12.   TODO: What notation should be used? The Media Fragments Draft, "Temporal Dimensions" section has some potentially viable formats, such as the "wall clock" Zulu-time format.

13.   TODO: does the Waveform-URI return a URI for each input stream, or are all input streams magically encoded into a single stream?

14.   TODO: does the Input-Waveform-URI cause any existing input streams to be ignored?

15.   TODO: Write some examples of one-shot and continuous recognition, EMMA documents, partial results, vendor-extensions, grammar/rule activation/deactivation, etc.

16.   TODO: insert more synthesis examples

Remaining issues from the requirements doc (http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jul/att-0023/Protocol_requirements_draft_-_RB.htm) and not already covered in the above list:

17.   DD64. API must have ability to set service-specific parameters using names that clearly identify that they are service-specific, e.g., using an "x-" prefix. Parameter values can be arbitrary Javascript objects. PE: We have custom vendor resource under 3.2.1 and vendor-listen-mode under 5.3. Presumably other custom params can be set by SET_PARAMS? MJ: Any issues pushing 'arbitrary javascript objects' over the protocol. RB: I'm uneasy declaring victory on this one. What exactly is an 'arbitrary javascript object'? If it can be serialized to something that can be conveyed with a vendor-specific header<http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-24#section-6.2.16> then we're okay. But I'd like to be sure.

18.   FPR58. Web application and speech services must have a means of binding session information to communications.<http://www.w3.org/2005/Incubator/htmlspeech/live/requirements.html#fpr58> MJ: Need to clarify. RB: This essentially means "supports cookies". The exact requirements for this are IMHO unclear and unconvincing. With headers like user-ID, vendor-specific headers, reco-context-block, etc, and the fact that there's a websockets session that wraps all the requests, it's unclear what a session cookie is needed for. But it could be added if necessary.

From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Robert Brown
Sent: Tuesday, August 02, 2011 4:23 PM
To: HTML Speech XG
Subject: Protocol Draft 4

Here's the 4th draft, incorporating most of the feedback and addressing many of the open issues in the 3rd draft and the requirements doc.

There are still a number of open questions, but we're getting closer. I'll compile a list of open issues for us to work through and send these out in another mail.

Here's the change list for the 4th draft:

Changes Since Draft 3
In addition to minor overall editing to aid readability, the following changes were incorporated in response to feedback on the third draft:
2. Definitions
Clarified synthesizer description.
3.1 Session Establishment
Clarified that service parameters may be specified in the query string, but may be overridden using messages in the html-speech/1.0 websockets protocol once the websockets session has been established.
Clarified that advanced scenarios involving multiple engines of the same resource type, or using the same input audio stream for consumption by different types of vendor-specific resources, are out of scope.
3.2 Signaling
Changed the request-ID definition to match SRGS: 1-10 decimal digits.
3.3 Media Transmission
Removed "skip" message.
Added "start of stream" message, which removes the purpose of the START-MEDIA-STREAM request on the Recognizer (thus removing an area of confusion from section 5).
Removed Request-ID from the header, replacing it with Stream-ID, also to remove some of the confusion in section 5.
Clarified multiplexing.
Generalized from "audio" to "media" and added some text about supported media formats.
Simplified the header to just be an 8-bit message type and 24-bit stream-ID.
4.1 Getting and Setting Parameters
Rewrote the capability query headers to make them more flexible (and in theory less unwieldy if more capabilities are added in the future).
Added a header for subscribing to interim events.
4.3 Requestless Notifications
Deleted this section.
4.3 Resource Selection
Added this section do explain how resources are selected based on language and other characteristics.
5. Recognition
Clarified that grammar/rule state can only change when the recognizer is idle.
Corrected a number of errors in the state diagram.
5.1 Recognition Requests
Removed START-MEDIA-STREAM.
Added GET-GRAMMARS (and changed SET-GRAMMAR to SET-GRAMMARS).
Added METADATA.
5.2 Recognition Events
Change START/END-OF-INPUT to START/END-OF-SPEECH.
5.3 Recognition Headers
Changed grammar-activate/grammar-deactivate to active-grammars/inactive-grammars
5.4 Recording and Re-Recognizing
Added this section, which also includes re-recognition.
5.5 Predefined Grammars
Was previously numbered 5.4.
Clarified that the specific set of grammars is TBD later, and is optional.
5.6 Recognition Examples
Was previously numbered 5.5.
Corrected the existing one-shot example to match the changes.
Added a continuous reco example.
6. Synthesis
Clarified that SSML and plain text MUST be supported, and other input formats are permitted.
6.3 Synthesis Headers
Tried to more specific about how the clock works.
Added a Stream-ID header to associate a SPEAK request with an output stream.
6.4 Synthesis Examples
Cleaned up the examples
Received on Thursday, 4 August 2011 01:10:14 UTC