RE: Requirement for UA / SS protocol from Robert Brown on 2010-11-19 (public-xg-htmlspeech@w3.org from November 2010)

From: Robert Brown <Robert.Brown@microsoft.com>
Date: Fri, 19 Nov 2010 01:49:05 +0000
To: "Young, Milan" <Milan.Young@nuance.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <113BCF28740AF44989BE7D3F84AE18DD197A9EC9@TK5EX14MBXC118.redmond.corp.microsoft.>

I mostly agree.  But do we need bidirectional events?  I suspect all the interesting ones originate at the server: start-of-speech; hypothesis; partial result; warnings of noise, crosstalk, etc.  I'm trying to think why the server would care about events from the client, other than when the client is done sending audio (which it could do in response to a click or end-point detection).

From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Young, Milan
Sent: Thursday, November 18, 2010 5:34 PM
To: public-xg-htmlspeech@w3.org
Subject: Requirement for UA / SS protocol

Hello,

On the Nov 18th conference, I volunteer to send out proposed wording for a new requirement:

Summary - User agents and speech services are required to support at least one common protocol.

Description - A common protocol will be defined as part of the final recommendation.  It will be built upon some TBD existing application layer protocol and include support for the following:

  * Streaming audio data (e.g. HTTP 1.1 chunking).  This include both audio streamed from UA to SS during recognition and audio streamed from SS to UA during synthesis.

  * Bidirectional events which can occur anytime during the interaction.  These events could originate either within the web app (e.g. click) or the SS (e.g. start-of-speech or mark) and must be transmitted through the UA in a timely fashion.  The set of events include both standard events defined by the final recommendation and extension events.

  * Both standard and extension parameters passed from the web app to the speech service at the start of the interaction.  List of standard parameters TBD.

  * EMMA results passed from the SS to the web app.  The syntax of this result is TBD (e.g. XML and/or JSON).

  * At least one standard audio codec.  UAs are permitted to advertise alternate codecs at the start of the interaction and SSs are allowed to select any such alternate (e.g. HTTP Accept).

  * Transport layer security (e.g. HTTPS) if requested by the web app.

  * Session identifier that could be used to form continuity across multiple interactions (e.g. HTTP cookies).

  * Interpretation over text.

  * Re-recognition using previous audio streams.

Thank you

Received on Friday, 19 November 2010 01:49:43 UTC