[minutes] 6 October 2011 from Dan Burnett on 2011-10-11 (public-xg-htmlspeech@w3.org from October 2011)

From: Dan Burnett <dburnett@voxeo.com>
Date: Tue, 11 Oct 2011 18:46:11 -0400
To: public-xg-htmlspeech@w3.org
Message-Id: <AFC68ED4-E10E-45AA-BF4A-9709CF1D5EE6@voxeo.com>

Group,

The minutes from last week's call are available at http://www.w3.org/2011/10/06-htmlspeech-minutes.html

For convenience, a text version is embedded below.

Thanks to Bjorn Bringert for taking the minutes.

-- dan

**********************************************************************************
HTML Speech Incubator Group Teleconference
06 Oct 2011

See also: [2]IRC log

[2] http://www.w3.org/2011/10/06-htmlspeech-irc

Attendees

Present
Charles_Hemphill Dan_Druta Debbie_Dahl Michael_Bodell Michael_Johnston Robert_Brown Milan_Young Olli_Pettay Bjorn_Bringert Patrick_Ehlen Glen_Shires
Regrets
Dan_Burnett
Chair
Michael_Bodell

Scribe
bringert

Contents

* [3]Topics
1. [4]Web API
2. [5]Protocol
* [6]Summary of Action Items
_________________________________________________________

<trackbot> Date: 06 October 2011

<smaug> oops

<glen> Reminder: sign up for TPAC for Thursday/Friday Nov 3-4,
Registration fee increases Oct 14

<glen> [7]http://www.w3.org/2011/11/TPAC/

[7] http://www.w3.org/2011/11/TPAC/

Web API

<robert> protocol questions received so far...

<ddahl> scribe:bringert

mbodell: Got through it last week
... will update doc

<robert> 1. how is INFO message exposed in the API?

<ddahl> meeting: HTML_Speech

<robert> 2. Some "nits" (his word) from Olli

bringert: satish emailed about the speechinputevent API, saying that
he thought that his proposal should actually work

mbodell: will respond on the list

Protocol

robert: we have some nits from Olli

<robert>
[8]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/
0007.html

[8] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0007.html

robert: bigger question is the one that Milan and other have been
talking about
...
[9]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/
0007.html
... you can given contextual information to recognizer with hints
about other stuff that's going on
... useful for multimodal apps
... how does that bubble up through the API
... Example: doing reco, have a camera that's tracking motion or a
touch screen
... want to send that to recognizer
... E.g. send GPS data when doing Voice Search on a mobile device

[9] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0007.html

Milan: two suggestions: 1) model it as an event
... 2) call it a parameter
... would be different from regular parameters since it's sent on
its own
... because you want to send the input during recognition
asynchronously

robert: especially for continuous recognition

bringert: what are examples of info to send during recognition?

Milan: scrolling list of pizza toppings
... weight the currently showing one higher
... should perhaps be a session parameter, since you may want other
session parameters

bringert: but these events are specific to a recognition, right?

Milan: not sure

robert: one session, contains 0-1 recognitions, 0 or more syntheses
... recognition can be oneshot or continuous
... recognizer can be in listening or idle state
... want to send info messages to recognizer while it is in the
listening state

Milan: could events apply to both TTS and recognition?

robert: for reco, app could have additional info to send to
recognizer
... for TTS, synthesizer may want to send events
... it's not an event
... it's additional info

Milan: could app events matter for TTS too?

robert: no, I don't think so
... app has some context data to communicate to recognizer
... what's the API for sending that message
... is it a property or a method call

Milan: it could apply to TTS too though
... e.g. sliding finger to change pitch

robert: the TTS is not a telephone, it's not interactive
... it might be faster than realtime

bringert: let's focus on recogniton, since they are separate APIs
... this seems like an event, not a parameter
... since it can happen asynchronously

robert: in the protocol it's an INFO message with a MIME type and a
content

mbodell: can you do it only during listening, or before too

robert: lemme check

bringert: should be able to do it at any time

robert: only in listening state
... decision made without much thought

<robert> state diagram of recognizer:
[10]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0012/speech-protocol-draft-05.htm#recognition-protocol

[10] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#recognition-protocol

bringert: looks like you can do it before listen too
... suggested API, method in SpeechInputRequest: sendInfo(string
type, string value)

<robert> this is the description of the info message: "In multimodal
applications, some recognizers will benefit from additional context.
Clients can use the INFO request to send this context. The
Content-Type header should specify the type of data, and the data
itself is contained in the message body."

Milan: is getParameter() ever useful?
... if we already have a way to send parameters in the request, why
do we need a way to send them outside the request

bringert: what's the difference between set-param and info

Milan: set-param is for standard headers

mbodell: could I use set-params for changing timeouts

robert: if you set something in set-params, that's the default value
for the next request

Milan: MRCP doesn't specify the semantics of set-param

robert: set-params, get-params introduce confusion, let's skip them

Milan: how does that handle mbodell's timeout example

robert: to set a parameter halfway through, do what SIP does, send
the invite again
... just resend the request

Milan: wouldn't that stop the previous request

robert: wouldn't have to
... we could either define what set-params does, or get rid of it

Milan: is there anything that you can't retract if you restart the
request without restarting
... implementations should be allowed to refuse

robert: you can keep streaming audio while stopping and starting a
new one on the same audio stream
... there will be some latency

Milan: it's pretty hard to change the grammar while recognizing

robert: yeah, we don't do that either

Milan: should we propose to remove set-params?
... is there a way to change parameters in the API?

mbodell: only by a new init()

robert: set-params while running in MRCP is weird

Milan: you can set session default parameters

robert: yeah, but that's not useful

Milan: how do you get info back from the recognizer?
... you can initiate that feedback from the client without
get-params

robert: in section 4.3 if you want to discover whether the
recognizer supports some languages and grammars
... you use get-params

Milan: so we need get-params
... that's why I wanted to model it as attributes

robert: it's not an attribute if you expect it to do something right
away

bringert: how do you use get-params to check whether some languages
are supported?

robert: use a header with supported-languages, response will include
the supported subset

Milan: no equivalent in the API

robert: two stage thing, first connects to server, does get-param
... if that doesn't return the right results, APi could just fail

mbodell: in the API init() would do this under the hood

<robert> here's how you use the protocol to discover supported
languages, etc:
[11]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0012/speech-protocol-draft-05.htm#capabilities-discovery

[11] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#capabilities-discovery

Milan: that specialized to session start-up
... what if we want to get info during the request

mbodell: that should come in the results

bringert: in the EMMA

Milan: so the assumption is that you tell recognizer about
everything up front

mbodell: yes

Milan: maybe that's ok
... what if the user is talking and then presses button to get their
account balance

mbodell: that would have been returned in EMMA

Milan: I'm tentatively ok with keeping get-params in protocol, and
not adding anything to web API to drive get-params explicitly

AGREEMENT: keep get-params
... don't add anything for get-params to web API

Michael: ok to remove set-params as long as INFO is plumbed into the
web API

AGREEMENT: remove set-params from protocol
... add method web API to send INFO to recognizer

mbodell: can you have more than 1 synthesis engine

robert: no, at most one tts engine
... but you can have more than one tts request
... can request TTS of any number of SSML docs in parallel
... up to engine to serialize them if needed
... up to web API to decide when to playback

mbodell: I just wanted to make sure that the discussion matches the
doc

robert: the topic came up, some synthesizers want to generate events
to do lip sync

<robert> viseme

robert: called viseme events
... how does an app say that they want this event

mbodell: vendor specific header?

robert: yes, but how do we handle vendor-specific events

Milan: I thought we had that already

bringert: I don't think so

Milan: but we have it in recognition?

mbodell: not sure

robert: some recognizer could want to give you additional
information

<robert> viseme example in the protocol spec:
[12]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0012/speech-protocol-draft-05.htm#interim-events

[12] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#interim-events

bringert: that could be in the EMMA

Milan: well, it might not be a final result

robert: example in the protocol spec:
[13]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0012/speech-protocol-draft-05.htm#interim-events
... of visemes
... there are a lot of events that various TTS engines send
... phoneme events, viseme events etc
... would follow the same API as mark events

[13] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#interim-events

Milan: except they're not standard

mbodell: what section of protocol spec is this?

Milan: 4.4
... I figured there would be something in the API firing events when
this happens

bringert: so this is interim-event in the protocol

Milan: yes

mbodell: could either throw a fixed event, or using the name in the
interim-event

Milan: how do you correlate and event to the engine?

bringert: the event is fired at the request object

Milan: how are content types handled

bringert: the event object would have an attribute

mbodell: e.g. attributes data and mimeType
... would any events be binary?

Milan: they are probably all text

bringert: looks like the protocol supports any MIME type
... is there some API for blob + type

smaug: not sure

bringert: how about a general object + type?

Milan: makes sense

<smaug> [14]http://www.w3.org/TR/2009/WD-FileAPI-20091117/#dfn-Blob

[14] http://www.w3.org/TR/2009/WD-FileAPI-20091117/#dfn-Blob

<smaug> no type

mbodell: I think we should use the interim-event-name

bringert: could clash with existing DOM events
... so the event name in JS would be something like
"speech-x-viseme-event"

smaug: should have vendor prefix too
... there could be a standard event, with an additional field for
the event name

bringert: pretty quivalent

smaug: if the event name is non-standard, you can only use
addEventListener()

mbodell: being able to register separate listeners is powerful

bringert: objections to firing "speech-X" events for an
interim-event with name X?

smaug: slight preference for a standard event

AGREEMENT: web API will fire "speech-X" events for any interim-event
with name X

Milan: how will we identify parameters as vendor-specific or
standard?

mbodell: API already has that, DOM attributes for standard
parameters, and a catch-all method for setting additional parameters

Milan: are you preventing newlines in those values?

mbodell: no, just DOMString

bringert: we can have restrictions on that

mbodell: or throw an error

bringert: lots of that in HTML already

Milan: we need to sync the attribute list from the web API with the
standard headers in the protocol

robert: I'm planning to go through both specs looking for
discrepancies

mbodell: there can be thing in the protocol that aren't in the API

DanD: how do we deal with security in the protocol?
... could be a DoS vulnerability

<Milan> Milan: If there are parameters in the protocol that don't
exist in the API, we should discuss those exceptions

DanD: how we prevent setting or getting parameters in a non-proper
way

member: Milan: If there are parameters in the protocol that don't
exist in the API, we should discuss those exceptions

DanD: read a web page, open lots of sessions from a different web
site accessing the same service
... doing lots of activity

robert: that's a general problem with web services
... the service provider needs DoS protection

DanD: will we have an API key registration

robert: the problem with API keys is that they don't work very well
... could have a list of tokens that can't be reused
... can't standardize security at this point

bringert: sending API key in vendor-specific header

robert: other interesting problems
... faffing about
... only solutions are proprietary
... there is a section somewhere

<robert> here are some notes on security in the protocol document:
[15]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0012/speech-protocol-draft-05.htm#protocol-security

[15] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#protocol-security

Michael_Johnston: this is a general problem for WebSockets
... this is not the only way to cause a server to do heavy lifting

DanD: we need to capture this in the security section and just say
that vendor-specific key-value pairs can be used

mbodell: question about grammar weights, section 5.3 in protocol doc
... only allow grammar weights that start with 0.
... in SRGS you can have weights that add up to more than 1

robert: that would work

Michael: in VoiceXML too
... not specifying a weight means that weight is 1

mbodell: in generall, we just say what the default is
... and that smaller means less likely and larger means more likely

AGREEMENT: allow any positive number as grammar weight

Michael: if you specify a weight on a ruleref, you weight the rule,
you don't scale the weights in the grammar
... if you have three grammars, with internal weights
... let's say two grammars
... I want to put weights 0.25 and 0.75 on them
... if they were combined to a single model, you would scale all
weights within them
... other interpretation is that it is a static weight on the path

bringert: aren't those equivalent

Michael: don't think so

bringert: multiple grammars should be like having a single top-level
grammar with weighted rulerefs for each grammar

Michael: should be also have a way to set scaling

mbodell: isn't that they same?
... details are recognizer-specific

bringert: hard to standardize this kind of maths

Michael: there is a difference between adding a static cost and
interpolating two SLMs

robert: one other note, Dan Burnett sent an email this morning that
he can't make it

Received on Tuesday, 11 October 2011 22:47:22 UTC