- From: Dan Burnett <dburnett@voxeo.com>
- Date: Tue, 11 Oct 2011 18:46:11 -0400
- To: public-xg-htmlspeech@w3.org
Group,
The minutes from last week's call are available at http://www.w3.org/2011/10/06-htmlspeech-minutes.html
For convenience, a text version is embedded below.
Thanks to Bjorn Bringert for taking the minutes.
-- dan
**********************************************************************************
HTML Speech Incubator Group Teleconference
06 Oct 2011
See also: [2]IRC log
[2] http://www.w3.org/2011/10/06-htmlspeech-irc
Attendees
Present
Charles_Hemphill Dan_Druta Debbie_Dahl Michael_Bodell Michael_Johnston Robert_Brown Milan_Young Olli_Pettay Bjorn_Bringert Patrick_Ehlen Glen_Shires
Regrets
Dan_Burnett
Chair
Michael_Bodell
Scribe
bringert
Contents
* [3]Topics
1. [4]Web API
2. [5]Protocol
* [6]Summary of Action Items
_________________________________________________________
<trackbot> Date: 06 October 2011
<smaug> oops
<glen> Reminder: sign up for TPAC for Thursday/Friday Nov 3-4,
Registration fee increases Oct 14
<glen> [7]http://www.w3.org/2011/11/TPAC/
[7] http://www.w3.org/2011/11/TPAC/
Web API
<robert> protocol questions received so far...
<ddahl> scribe:bringert
mbodell: Got through it last week
... will update doc
<robert> 1. how is INFO message exposed in the API?
<ddahl> meeting: HTML_Speech
<robert> 2. Some "nits" (his word) from Olli
bringert: satish emailed about the speechinputevent API, saying that
he thought that his proposal should actually work
mbodell: will respond on the list
Protocol
robert: we have some nits from Olli
<robert>
[8]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/
0007.html
[8] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0007.html
robert: bigger question is the one that Milan and other have been
talking about
...
[9]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/
0007.html
... you can given contextual information to recognizer with hints
about other stuff that's going on
... useful for multimodal apps
... how does that bubble up through the API
... Example: doing reco, have a camera that's tracking motion or a
touch screen
... want to send that to recognizer
... E.g. send GPS data when doing Voice Search on a mobile device
[9] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0007.html
Milan: two suggestions: 1) model it as an event
... 2) call it a parameter
... would be different from regular parameters since it's sent on
its own
... because you want to send the input during recognition
asynchronously
robert: especially for continuous recognition
bringert: what are examples of info to send during recognition?
Milan: scrolling list of pizza toppings
... weight the currently showing one higher
... should perhaps be a session parameter, since you may want other
session parameters
bringert: but these events are specific to a recognition, right?
Milan: not sure
robert: one session, contains 0-1 recognitions, 0 or more syntheses
... recognition can be oneshot or continuous
... recognizer can be in listening or idle state
... want to send info messages to recognizer while it is in the
listening state
Milan: could events apply to both TTS and recognition?
robert: for reco, app could have additional info to send to
recognizer
... for TTS, synthesizer may want to send events
... it's not an event
... it's additional info
Milan: could app events matter for TTS too?
robert: no, I don't think so
... app has some context data to communicate to recognizer
... what's the API for sending that message
... is it a property or a method call
Milan: it could apply to TTS too though
... e.g. sliding finger to change pitch
robert: the TTS is not a telephone, it's not interactive
... it might be faster than realtime
bringert: let's focus on recogniton, since they are separate APIs
... this seems like an event, not a parameter
... since it can happen asynchronously
robert: in the protocol it's an INFO message with a MIME type and a
content
mbodell: can you do it only during listening, or before too
robert: lemme check
bringert: should be able to do it at any time
robert: only in listening state
... decision made without much thought
<robert> state diagram of recognizer:
[10]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0012/speech-protocol-draft-05.htm#recognition-protocol
[10] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#recognition-protocol
bringert: looks like you can do it before listen too
... suggested API, method in SpeechInputRequest: sendInfo(string
type, string value)
<robert> this is the description of the info message: "In multimodal
applications, some recognizers will benefit from additional context.
Clients can use the INFO request to send this context. The
Content-Type header should specify the type of data, and the data
itself is contained in the message body."
Milan: is getParameter() ever useful?
... if we already have a way to send parameters in the request, why
do we need a way to send them outside the request
bringert: what's the difference between set-param and info
Milan: set-param is for standard headers
mbodell: could I use set-params for changing timeouts
robert: if you set something in set-params, that's the default value
for the next request
Milan: MRCP doesn't specify the semantics of set-param
robert: set-params, get-params introduce confusion, let's skip them
Milan: how does that handle mbodell's timeout example
robert: to set a parameter halfway through, do what SIP does, send
the invite again
... just resend the request
Milan: wouldn't that stop the previous request
robert: wouldn't have to
... we could either define what set-params does, or get rid of it
Milan: is there anything that you can't retract if you restart the
request without restarting
... implementations should be allowed to refuse
robert: you can keep streaming audio while stopping and starting a
new one on the same audio stream
... there will be some latency
Milan: it's pretty hard to change the grammar while recognizing
robert: yeah, we don't do that either
Milan: should we propose to remove set-params?
... is there a way to change parameters in the API?
mbodell: only by a new init()
robert: set-params while running in MRCP is weird
Milan: you can set session default parameters
robert: yeah, but that's not useful
Milan: how do you get info back from the recognizer?
... you can initiate that feedback from the client without
get-params
robert: in section 4.3 if you want to discover whether the
recognizer supports some languages and grammars
... you use get-params
Milan: so we need get-params
... that's why I wanted to model it as attributes
robert: it's not an attribute if you expect it to do something right
away
bringert: how do you use get-params to check whether some languages
are supported?
robert: use a header with supported-languages, response will include
the supported subset
Milan: no equivalent in the API
robert: two stage thing, first connects to server, does get-param
... if that doesn't return the right results, APi could just fail
mbodell: in the API init() would do this under the hood
<robert> here's how you use the protocol to discover supported
languages, etc:
[11]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0012/speech-protocol-draft-05.htm#capabilities-discovery
[11] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#capabilities-discovery
Milan: that specialized to session start-up
... what if we want to get info during the request
mbodell: that should come in the results
bringert: in the EMMA
Milan: so the assumption is that you tell recognizer about
everything up front
mbodell: yes
Milan: maybe that's ok
... what if the user is talking and then presses button to get their
account balance
mbodell: that would have been returned in EMMA
Milan: I'm tentatively ok with keeping get-params in protocol, and
not adding anything to web API to drive get-params explicitly
AGREEMENT: keep get-params
... don't add anything for get-params to web API
Michael: ok to remove set-params as long as INFO is plumbed into the
web API
AGREEMENT: remove set-params from protocol
... add method web API to send INFO to recognizer
mbodell: can you have more than 1 synthesis engine
robert: no, at most one tts engine
... but you can have more than one tts request
... can request TTS of any number of SSML docs in parallel
... up to engine to serialize them if needed
... up to web API to decide when to playback
mbodell: I just wanted to make sure that the discussion matches the
doc
robert: the topic came up, some synthesizers want to generate events
to do lip sync
<robert> viseme
robert: called viseme events
... how does an app say that they want this event
mbodell: vendor specific header?
robert: yes, but how do we handle vendor-specific events
Milan: I thought we had that already
bringert: I don't think so
Milan: but we have it in recognition?
mbodell: not sure
robert: some recognizer could want to give you additional
information
<robert> viseme example in the protocol spec:
[12]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0012/speech-protocol-draft-05.htm#interim-events
[12] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#interim-events
bringert: that could be in the EMMA
Milan: well, it might not be a final result
robert: example in the protocol spec:
[13]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0012/speech-protocol-draft-05.htm#interim-events
... of visemes
... there are a lot of events that various TTS engines send
... phoneme events, viseme events etc
... would follow the same API as mark events
[13] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#interim-events
Milan: except they're not standard
mbodell: what section of protocol spec is this?
Milan: 4.4
... I figured there would be something in the API firing events when
this happens
bringert: so this is interim-event in the protocol
Milan: yes
mbodell: could either throw a fixed event, or using the name in the
interim-event
Milan: how do you correlate and event to the engine?
bringert: the event is fired at the request object
Milan: how are content types handled
bringert: the event object would have an attribute
mbodell: e.g. attributes data and mimeType
... would any events be binary?
Milan: they are probably all text
bringert: looks like the protocol supports any MIME type
... is there some API for blob + type
smaug: not sure
bringert: how about a general object + type?
Milan: makes sense
<smaug> [14]http://www.w3.org/TR/2009/WD-FileAPI-20091117/#dfn-Blob
[14] http://www.w3.org/TR/2009/WD-FileAPI-20091117/#dfn-Blob
<smaug> no type
mbodell: I think we should use the interim-event-name
bringert: could clash with existing DOM events
... so the event name in JS would be something like
"speech-x-viseme-event"
smaug: should have vendor prefix too
... there could be a standard event, with an additional field for
the event name
bringert: pretty quivalent
smaug: if the event name is non-standard, you can only use
addEventListener()
mbodell: being able to register separate listeners is powerful
bringert: objections to firing "speech-X" events for an
interim-event with name X?
smaug: slight preference for a standard event
AGREEMENT: web API will fire "speech-X" events for any interim-event
with name X
Milan: how will we identify parameters as vendor-specific or
standard?
mbodell: API already has that, DOM attributes for standard
parameters, and a catch-all method for setting additional parameters
Milan: are you preventing newlines in those values?
mbodell: no, just DOMString
bringert: we can have restrictions on that
mbodell: or throw an error
bringert: lots of that in HTML already
Milan: we need to sync the attribute list from the web API with the
standard headers in the protocol
robert: I'm planning to go through both specs looking for
discrepancies
mbodell: there can be thing in the protocol that aren't in the API
DanD: how do we deal with security in the protocol?
... could be a DoS vulnerability
<Milan> Milan: If there are parameters in the protocol that don't
exist in the API, we should discuss those exceptions
DanD: how we prevent setting or getting parameters in a non-proper
way
member: Milan: If there are parameters in the protocol that don't
exist in the API, we should discuss those exceptions
eh
DanD: read a web page, open lots of sessions from a different web
site accessing the same service
... doing lots of activity
robert: that's a general problem with web services
... the service provider needs DoS protection
DanD: will we have an API key registration
robert: the problem with API keys is that they don't work very well
... could have a list of tokens that can't be reused
... can't standardize security at this point
bringert: sending API key in vendor-specific header
robert: other interesting problems
... faffing about
... only solutions are proprietary
... there is a section somewhere
<robert> here are some notes on security in the protocol document:
[15]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
/att-0012/speech-protocol-draft-05.htm#protocol-security
[15] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#protocol-security
Michael_Johnston: this is a general problem for WebSockets
... this is not the only way to cause a server to do heavy lifting
DanD: we need to capture this in the security section and just say
that vendor-specific key-value pairs can be used
mbodell: question about grammar weights, section 5.3 in protocol doc
... only allow grammar weights that start with 0.
... in SRGS you can have weights that add up to more than 1
robert: that would work
Michael: in VoiceXML too
... not specifying a weight means that weight is 1
mbodell: in generall, we just say what the default is
... and that smaller means less likely and larger means more likely
AGREEMENT: allow any positive number as grammar weight
Michael: if you specify a weight on a ruleref, you weight the rule,
you don't scale the weights in the grammar
... if you have three grammars, with internal weights
... let's say two grammars
... I want to put weights 0.25 and 0.75 on them
... if they were combined to a single model, you would scale all
weights within them
... other interpretation is that it is a static weight on the path
bringert: aren't those equivalent
Michael: don't think so
bringert: multiple grammars should be like having a single top-level
grammar with weighted rulerefs for each grammar
Michael: should be also have a way to set scaling
mbodell: isn't that they same?
... details are recognizer-specific
bringert: hard to standardize this kind of maths
Michael: there is a difference between adding a static cost and
interpolating two SLMs
robert: one other note, Dan Burnett sent an email this morning that
he can't make it
Received on Tuesday, 11 October 2011 22:47:22 UTC