W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > October 2011

RE: Reminder: send questions

From: Young, Milan <Milan.Young@nuance.com>
Date: Tue, 4 Oct 2011 18:52:43 -0700
Message-ID: <1AA381D92997964F898DF2A3AA4FF9AD0D0A58E6@SUN-EXCH01.nuance.com>
To: Patrick Ehlen <pehlen@attinteractive.com>, Michael Johnston <johnston@research.att.com>, Dan Burnett <dburnett@voxeo.com>
CC: <public-xg-htmlspeech@w3.org>
To handle Michael J's needs, we could model this as a write-only
parameter.  For example: "x-att-current-display-item = pepperoni".

That said, however, I agree that events are a more natural syntactic
model for such things.  But since they share exactly the same semantic
model as session-level parameters, (ie "send this message to the server
right now"), having two constructs seems like duplication.  If we could
only have one, I would favor session parameters because they cover more
use cases.



-----Original Message-----
From: Patrick Ehlen [mailto:pehlen@attinteractive.com] 
Sent: Tuesday, October 04, 2011 5:15 PM
To: Young, Milan; Michael Johnston; Dan Burnett
Cc: public-xg-htmlspeech@w3.org
Subject: Re: Reminder: send questions

But I think Michael is referring to using INFO for data that may change
within a session and be different on each request. For example, user
gestures that accompanied the audio, or contextual information like the
names of entities displayed in the GUI that might ASR might use to
weight them differently. 

These kinds of things are request-related, not session-related, and we
need a method to transmit them, if not using INFO.


On 10/4/11 5:02 PM, "Young, Milan" <Milan.Young@nuance.com> wrote:

> I've been thinking of these INFO messages as being session-level 
> parameters.  Both models would result in immediate communication with 
> the speech service which seems to be what you are after.  I prefer the

> parameter model for the following reasons:
> 
>   * Session parameters could also be used to apply persistent 
> parameters (i.e. those that should be implicitly part of every 
> request).  For example a default speech timeout or user-id.  It's true

> that you could accomplish the same with INFO messages, but calling 
> them session parameters seems conceptually cleaner.
> 
>   * Session parameters are gettable.  This allows blocking 
> communication with the remote service when required.  Otherwise you 
> need to do a dance where you release control and then wait for some
event.
> 
>   * Session parameters are useful for modeling configurable values 
> that are better associated with the session rather than requests.  For

> example audio codecs, service versions, protocol versions, etc.
> 
> 
> Thanks
> 
> 
> -----Original Message-----
> From: public-xg-htmlspeech-request@w3.org
> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of JOHNSTON, 
> MICHAEL J (MICHAEL J)
> Sent: Tuesday, October 04, 2011 1:07 PM
> To: Dan Burnett
> Cc: public-xg-htmlspeech@w3.org
> Subject: Re: Reminder: send questions
> 
> Here is one I sent earlier:
> 
> One thing I see missing from the API draft is support for the INFO 
> messages for sending metadata to the recognizer during recognition.
> 
> In the html+speech protocol we have a generic capability to send 
> metadata to the recognizer, the
> relevant reco-method is INFO (see below).   These messages can be sent
> during the transmission of
> audio.  This covers multimodal use cases where there may be metadata 
> (e.g. GUI actions, button clicks etc) that take place while the user 
> is speaking, which are relevant for processing the user's audio.
> 
> To support this at the API level we need some kind of method on 
> SpeechInputRequest that will cause the INFO message to be sent over 
> the protocol.
> 
> e.g.
> 
> interface SpeechInputRequest {
> 
> .....
> 
> void
> s<file:///Users/johnstonmjr/NOTES/2011/sep%202011/speechwepapi.html#df
> n-
> setsensitivity>endinfo(in DOMstring
> i<file:///Users/johnstonmjr/NOTES/2011/sep%202011/speechwepapi.html#df
> n-
> sensitivity>nfo);
> 
> .....
> 
> 
> Michael
> 
> 
> 
> 
> 
> reco-method  = "LISTEN"             ; Transitions Idle -> Listening
>              | "START-INPUT-TIMERS" ; Starts the timer for the various

> input timeout conditions
>              | "STOP"               ; Transitions Listening -> Idle
>              | "DEFINE-GRAMMAR"     ; Pre-loads & compiles a grammar,
> assigns a temporary URI for reference in other methods
>              | "CLEAR-GRAMMARS"     ; Unloads all grammars, whether
> active or inactive
>              | "INTERPRET"          ; Interprets input text as though
it
> was spoken
>              | "INFO"               ; Sends metadata to the recognizer
> 
> INFO
> 
> In multimodal applications, some recognizers will benefit from 
> additional context. Clients can use the INFO request to send this 
> context. The Content-Type header should specify the type of data, and 
> the data itself is contained in the message body.
> 
> 
> 
> On Oct 4, 2011, at 3:03 PM, Dan Burnett wrote:
> 
> Group,
> 
> Please remember to send any questions you have about how the protocol 
> relates to the Web API in advance of our call this week so Robert can 
> be ready to address them.
> 
> The most recent version of the protocol on the mailing list is here
[1].
> 
> -- dan
> 
> [1]
> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0012.
> ht
> ml
> 
> 
> 
> 
> 
Received on Wednesday, 5 October 2011 01:54:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 5 October 2011 01:54:07 GMT