W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > October 2011

RE: Reminder: send questions

From: Young, Milan <Milan.Young@nuance.com>
Date: Tue, 4 Oct 2011 17:02:26 -0700
Message-ID: <1AA381D92997964F898DF2A3AA4FF9AD0D0A58AA@SUN-EXCH01.nuance.com>
To: "JOHNSTON, MICHAEL J (MICHAEL J)" <johnston@research.att.com>, Dan Burnett <dburnett@voxeo.com>
CC: <public-xg-htmlspeech@w3.org>
I've been thinking of these INFO messages as being session-level
parameters.  Both models would result in immediate communication with
the speech service which seems to be what you are after.  I prefer the
parameter model for the following reasons:

  * Session parameters could also be used to apply persistent parameters
(i.e. those that should be implicitly part of every request).  For
example a default speech timeout or user-id.  It's true that you could
accomplish the same with INFO messages, but calling them session
parameters seems conceptually cleaner.

  * Session parameters are gettable.  This allows blocking communication
with the remote service when required.  Otherwise you need to do a dance
where you release control and then wait for some event.

  * Session parameters are useful for modeling configurable values that
are better associated with the session rather than requests.  For
example audio codecs, service versions, protocol versions, etc.


-----Original Message-----
From: public-xg-htmlspeech-request@w3.org
[mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of JOHNSTON,
Sent: Tuesday, October 04, 2011 1:07 PM
To: Dan Burnett
Cc: public-xg-htmlspeech@w3.org
Subject: Re: Reminder: send questions

Here is one I sent earlier:

One thing I see missing from the API draft is support for the INFO
messages for sending metadata to the recognizer during recognition.

In the html+speech protocol we have a generic capability to send
metadata to the recognizer, the
relevant reco-method is INFO (see below).   These messages can be sent
during the transmission of
audio.  This covers multimodal use cases where there may be metadata
(e.g. GUI actions, button clicks etc) that take place while the user is
speaking, which are relevant for processing the user's audio.

To support this at the API level we need some kind of method on
SpeechInputRequest that will cause the INFO message to be sent over the


interface SpeechInputRequest {


setsensitivity>endinfo(in DOMstring



reco-method  = "LISTEN"             ; Transitions Idle -> Listening
             | "START-INPUT-TIMERS" ; Starts the timer for the various
input timeout conditions
             | "STOP"               ; Transitions Listening -> Idle
             | "DEFINE-GRAMMAR"     ; Pre-loads & compiles a grammar,
assigns a temporary URI for reference in other methods
             | "CLEAR-GRAMMARS"     ; Unloads all grammars, whether
active or inactive
             | "INTERPRET"          ; Interprets input text as though it
was spoken
             | "INFO"               ; Sends metadata to the recognizer


In multimodal applications, some recognizers will benefit from
additional context. Clients can use the INFO request to send this
context. The Content-Type header should specify the type of data, and
the data itself is contained in the message body.

On Oct 4, 2011, at 3:03 PM, Dan Burnett wrote:


Please remember to send any questions you have about how the protocol
relates to the Web API in advance of our call this week so Robert can be
ready to address them.

The most recent version of the protocol on the mailing list is here [1].

-- dan

Received on Wednesday, 5 October 2011 00:03:56 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:50 UTC