- From: Dan Burnett <dburnett@voxeo.com>
- Date: Tue, 11 Oct 2011 18:46:11 -0400
- To: public-xg-htmlspeech@w3.org
Group, The minutes from last week's call are available at http://www.w3.org/2011/10/06-htmlspeech-minutes.html For convenience, a text version is embedded below. Thanks to Bjorn Bringert for taking the minutes. -- dan ********************************************************************************** HTML Speech Incubator Group Teleconference 06 Oct 2011 See also: [2]IRC log [2] http://www.w3.org/2011/10/06-htmlspeech-irc Attendees Present Charles_Hemphill Dan_Druta Debbie_Dahl Michael_Bodell Michael_Johnston Robert_Brown Milan_Young Olli_Pettay Bjorn_Bringert Patrick_Ehlen Glen_Shires Regrets Dan_Burnett Chair Michael_Bodell Scribe bringert Contents * [3]Topics 1. [4]Web API 2. [5]Protocol * [6]Summary of Action Items _________________________________________________________ <trackbot> Date: 06 October 2011 <smaug> oops <glen> Reminder: sign up for TPAC for Thursday/Friday Nov 3-4, Registration fee increases Oct 14 <glen> [7]http://www.w3.org/2011/11/TPAC/ [7] http://www.w3.org/2011/11/TPAC/ Web API <robert> protocol questions received so far... <ddahl> scribe:bringert mbodell: Got through it last week ... will update doc <robert> 1. how is INFO message exposed in the API? <ddahl> meeting: HTML_Speech <robert> 2. Some "nits" (his word) from Olli bringert: satish emailed about the speechinputevent API, saying that he thought that his proposal should actually work mbodell: will respond on the list Protocol robert: we have some nits from Olli <robert> [8]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/ 0007.html [8] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0007.html robert: bigger question is the one that Milan and other have been talking about ... [9]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/ 0007.html ... you can given contextual information to recognizer with hints about other stuff that's going on ... useful for multimodal apps ... how does that bubble up through the API ... Example: doing reco, have a camera that's tracking motion or a touch screen ... want to send that to recognizer ... E.g. send GPS data when doing Voice Search on a mobile device [9] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0007.html Milan: two suggestions: 1) model it as an event ... 2) call it a parameter ... would be different from regular parameters since it's sent on its own ... because you want to send the input during recognition asynchronously robert: especially for continuous recognition bringert: what are examples of info to send during recognition? Milan: scrolling list of pizza toppings ... weight the currently showing one higher ... should perhaps be a session parameter, since you may want other session parameters bringert: but these events are specific to a recognition, right? Milan: not sure robert: one session, contains 0-1 recognitions, 0 or more syntheses ... recognition can be oneshot or continuous ... recognizer can be in listening or idle state ... want to send info messages to recognizer while it is in the listening state Milan: could events apply to both TTS and recognition? robert: for reco, app could have additional info to send to recognizer ... for TTS, synthesizer may want to send events ... it's not an event ... it's additional info Milan: could app events matter for TTS too? robert: no, I don't think so ... app has some context data to communicate to recognizer ... what's the API for sending that message ... is it a property or a method call Milan: it could apply to TTS too though ... e.g. sliding finger to change pitch robert: the TTS is not a telephone, it's not interactive ... it might be faster than realtime bringert: let's focus on recogniton, since they are separate APIs ... this seems like an event, not a parameter ... since it can happen asynchronously robert: in the protocol it's an INFO message with a MIME type and a content mbodell: can you do it only during listening, or before too robert: lemme check bringert: should be able to do it at any time robert: only in listening state ... decision made without much thought <robert> state diagram of recognizer: [10]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep /att-0012/speech-protocol-draft-05.htm#recognition-protocol [10] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#recognition-protocol bringert: looks like you can do it before listen too ... suggested API, method in SpeechInputRequest: sendInfo(string type, string value) <robert> this is the description of the info message: "In multimodal applications, some recognizers will benefit from additional context. Clients can use the INFO request to send this context. The Content-Type header should specify the type of data, and the data itself is contained in the message body." Milan: is getParameter() ever useful? ... if we already have a way to send parameters in the request, why do we need a way to send them outside the request bringert: what's the difference between set-param and info Milan: set-param is for standard headers mbodell: could I use set-params for changing timeouts robert: if you set something in set-params, that's the default value for the next request Milan: MRCP doesn't specify the semantics of set-param robert: set-params, get-params introduce confusion, let's skip them Milan: how does that handle mbodell's timeout example robert: to set a parameter halfway through, do what SIP does, send the invite again ... just resend the request Milan: wouldn't that stop the previous request robert: wouldn't have to ... we could either define what set-params does, or get rid of it Milan: is there anything that you can't retract if you restart the request without restarting ... implementations should be allowed to refuse robert: you can keep streaming audio while stopping and starting a new one on the same audio stream ... there will be some latency Milan: it's pretty hard to change the grammar while recognizing robert: yeah, we don't do that either Milan: should we propose to remove set-params? ... is there a way to change parameters in the API? mbodell: only by a new init() robert: set-params while running in MRCP is weird Milan: you can set session default parameters robert: yeah, but that's not useful Milan: how do you get info back from the recognizer? ... you can initiate that feedback from the client without get-params robert: in section 4.3 if you want to discover whether the recognizer supports some languages and grammars ... you use get-params Milan: so we need get-params ... that's why I wanted to model it as attributes robert: it's not an attribute if you expect it to do something right away bringert: how do you use get-params to check whether some languages are supported? robert: use a header with supported-languages, response will include the supported subset Milan: no equivalent in the API robert: two stage thing, first connects to server, does get-param ... if that doesn't return the right results, APi could just fail mbodell: in the API init() would do this under the hood <robert> here's how you use the protocol to discover supported languages, etc: [11]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep /att-0012/speech-protocol-draft-05.htm#capabilities-discovery [11] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#capabilities-discovery Milan: that specialized to session start-up ... what if we want to get info during the request mbodell: that should come in the results bringert: in the EMMA Milan: so the assumption is that you tell recognizer about everything up front mbodell: yes Milan: maybe that's ok ... what if the user is talking and then presses button to get their account balance mbodell: that would have been returned in EMMA Milan: I'm tentatively ok with keeping get-params in protocol, and not adding anything to web API to drive get-params explicitly AGREEMENT: keep get-params ... don't add anything for get-params to web API Michael: ok to remove set-params as long as INFO is plumbed into the web API AGREEMENT: remove set-params from protocol ... add method web API to send INFO to recognizer mbodell: can you have more than 1 synthesis engine robert: no, at most one tts engine ... but you can have more than one tts request ... can request TTS of any number of SSML docs in parallel ... up to engine to serialize them if needed ... up to web API to decide when to playback mbodell: I just wanted to make sure that the discussion matches the doc robert: the topic came up, some synthesizers want to generate events to do lip sync <robert> viseme robert: called viseme events ... how does an app say that they want this event mbodell: vendor specific header? robert: yes, but how do we handle vendor-specific events Milan: I thought we had that already bringert: I don't think so Milan: but we have it in recognition? mbodell: not sure robert: some recognizer could want to give you additional information <robert> viseme example in the protocol spec: [12]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep /att-0012/speech-protocol-draft-05.htm#interim-events [12] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#interim-events bringert: that could be in the EMMA Milan: well, it might not be a final result robert: example in the protocol spec: [13]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep /att-0012/speech-protocol-draft-05.htm#interim-events ... of visemes ... there are a lot of events that various TTS engines send ... phoneme events, viseme events etc ... would follow the same API as mark events [13] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#interim-events Milan: except they're not standard mbodell: what section of protocol spec is this? Milan: 4.4 ... I figured there would be something in the API firing events when this happens bringert: so this is interim-event in the protocol Milan: yes mbodell: could either throw a fixed event, or using the name in the interim-event Milan: how do you correlate and event to the engine? bringert: the event is fired at the request object Milan: how are content types handled bringert: the event object would have an attribute mbodell: e.g. attributes data and mimeType ... would any events be binary? Milan: they are probably all text bringert: looks like the protocol supports any MIME type ... is there some API for blob + type smaug: not sure bringert: how about a general object + type? Milan: makes sense <smaug> [14]http://www.w3.org/TR/2009/WD-FileAPI-20091117/#dfn-Blob [14] http://www.w3.org/TR/2009/WD-FileAPI-20091117/#dfn-Blob <smaug> no type mbodell: I think we should use the interim-event-name bringert: could clash with existing DOM events ... so the event name in JS would be something like "speech-x-viseme-event" smaug: should have vendor prefix too ... there could be a standard event, with an additional field for the event name bringert: pretty quivalent smaug: if the event name is non-standard, you can only use addEventListener() mbodell: being able to register separate listeners is powerful bringert: objections to firing "speech-X" events for an interim-event with name X? smaug: slight preference for a standard event AGREEMENT: web API will fire "speech-X" events for any interim-event with name X Milan: how will we identify parameters as vendor-specific or standard? mbodell: API already has that, DOM attributes for standard parameters, and a catch-all method for setting additional parameters Milan: are you preventing newlines in those values? mbodell: no, just DOMString bringert: we can have restrictions on that mbodell: or throw an error bringert: lots of that in HTML already Milan: we need to sync the attribute list from the web API with the standard headers in the protocol robert: I'm planning to go through both specs looking for discrepancies mbodell: there can be thing in the protocol that aren't in the API DanD: how do we deal with security in the protocol? ... could be a DoS vulnerability <Milan> Milan: If there are parameters in the protocol that don't exist in the API, we should discuss those exceptions DanD: how we prevent setting or getting parameters in a non-proper way member: Milan: If there are parameters in the protocol that don't exist in the API, we should discuss those exceptions eh DanD: read a web page, open lots of sessions from a different web site accessing the same service ... doing lots of activity robert: that's a general problem with web services ... the service provider needs DoS protection DanD: will we have an API key registration robert: the problem with API keys is that they don't work very well ... could have a list of tokens that can't be reused ... can't standardize security at this point bringert: sending API key in vendor-specific header robert: other interesting problems ... faffing about ... only solutions are proprietary ... there is a section somewhere <robert> here are some notes on security in the protocol document: [15]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep /att-0012/speech-protocol-draft-05.htm#protocol-security [15] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#protocol-security Michael_Johnston: this is a general problem for WebSockets ... this is not the only way to cause a server to do heavy lifting DanD: we need to capture this in the security section and just say that vendor-specific key-value pairs can be used mbodell: question about grammar weights, section 5.3 in protocol doc ... only allow grammar weights that start with 0. ... in SRGS you can have weights that add up to more than 1 robert: that would work Michael: in VoiceXML too ... not specifying a weight means that weight is 1 mbodell: in generall, we just say what the default is ... and that smaller means less likely and larger means more likely AGREEMENT: allow any positive number as grammar weight Michael: if you specify a weight on a ruleref, you weight the rule, you don't scale the weights in the grammar ... if you have three grammars, with internal weights ... let's say two grammars ... I want to put weights 0.25 and 0.75 on them ... if they were combined to a single model, you would scale all weights within them ... other interpretation is that it is a static weight on the path bringert: aren't those equivalent Michael: don't think so bringert: multiple grammars should be like having a single top-level grammar with weighted rulerefs for each grammar Michael: should be also have a way to set scaling mbodell: isn't that they same? ... details are recognizer-specific bringert: hard to standardize this kind of maths Michael: there is a difference between adding a static cost and interpolating two SLMs robert: one other note, Dan Burnett sent an email this morning that he can't make it
Received on Tuesday, 11 October 2011 22:47:22 UTC