[minutes] 6 October 2011

Group,

The minutes from last week's call are available at http://www.w3.org/2011/10/06-htmlspeech-minutes.html

For convenience, a text version is embedded below.

Thanks to Bjorn Bringert for taking the minutes.

-- dan

**********************************************************************************
              HTML Speech Incubator Group Teleconference
                              06 Oct 2011

   See also: [2]IRC log

      [2] http://www.w3.org/2011/10/06-htmlspeech-irc

Attendees

   Present
          Charles_Hemphill Dan_Druta Debbie_Dahl Michael_Bodell Michael_Johnston Robert_Brown Milan_Young Olli_Pettay Bjorn_Bringert Patrick_Ehlen Glen_Shires
   Regrets
          Dan_Burnett
   Chair
          Michael_Bodell

   Scribe
          bringert

Contents

     * [3]Topics
         1. [4]Web API
         2. [5]Protocol
     * [6]Summary of Action Items
     _________________________________________________________

   <trackbot> Date: 06 October 2011

   <smaug> oops

   <glen> Reminder: sign up for TPAC for Thursday/Friday Nov 3-4,
   Registration fee increases Oct 14

   <glen> [7]http://www.w3.org/2011/11/TPAC/

      [7] http://www.w3.org/2011/11/TPAC/

Web API

   <robert> protocol questions received so far...

   <ddahl> scribe:bringert

   mbodell: Got through it last week
   ... will update doc

   <robert> 1. how is INFO message exposed in the API?

   <ddahl> meeting: HTML_Speech

   <robert> 2. Some "nits" (his word) from Olli

   bringert: satish emailed about the speechinputevent API, saying that
   he thought that his proposal should actually work

   mbodell: will respond on the list

Protocol

   robert: we have some nits from Olli

   <robert>
   [8]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/
   0007.html

      [8] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0007.html

   robert: bigger question is the one that Milan and other have been
   talking about
   ...
   [9]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/
   0007.html
   ... you can given contextual information to recognizer with hints
   about other stuff that's going on
   ... useful for multimodal apps
   ... how does that bubble up through the API
   ... Example: doing reco, have a camera that's tracking motion or a
   touch screen
   ... want to send that to recognizer
   ... E.g. send GPS data when doing Voice Search on a mobile device

      [9] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0007.html

   Milan: two suggestions: 1) model it as an event
   ... 2) call it a parameter
   ... would be different from regular parameters since it's sent on
   its own
   ... because you want to send the input during recognition
   asynchronously

   robert: especially for continuous recognition

   bringert: what are examples of info to send during recognition?

   Milan: scrolling list of pizza toppings
   ... weight the currently showing one higher
   ... should perhaps be a session parameter, since you may want other
   session parameters

   bringert: but these events are specific to a recognition, right?

   Milan: not sure

   robert: one session, contains 0-1 recognitions, 0 or more syntheses
   ... recognition can be oneshot or continuous
   ... recognizer can be in listening or idle state
   ... want to send info messages to recognizer while it is in the
   listening state

   Milan: could events apply to both TTS and recognition?

   robert: for reco, app could have additional info to send to
   recognizer
   ... for TTS, synthesizer may want to send events
   ... it's not an event
   ... it's additional info

   Milan: could app events matter for TTS too?

   robert: no, I don't think so
   ... app has some context data to communicate to recognizer
   ... what's the API for sending that message
   ... is it a property or a method call

   Milan: it could apply to TTS too though
   ... e.g. sliding finger to change pitch

   robert: the TTS is not a telephone, it's not interactive
   ... it might be faster than realtime

   bringert: let's focus on recogniton, since they are separate APIs
   ... this seems like an event, not a parameter
   ... since it can happen asynchronously

   robert: in the protocol it's an INFO message with a MIME type and a
   content

   mbodell: can you do it only during listening, or before too

   robert: lemme check

   bringert: should be able to do it at any time

   robert: only in listening state
   ... decision made without much thought

   <robert> state diagram of recognizer:
   [10]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
   /att-0012/speech-protocol-draft-05.htm#recognition-protocol

     [10] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#recognition-protocol

   bringert: looks like you can do it before listen too
   ... suggested API, method in SpeechInputRequest: sendInfo(string
   type, string value)

   <robert> this is the description of the info message: "In multimodal
   applications, some recognizers will benefit from additional context.
   Clients can use the INFO request to send this context. The
   Content-Type header should specify the type of data, and the data
   itself is contained in the message body."

   Milan: is getParameter() ever useful?
   ... if we already have a way to send parameters in the request, why
   do we need a way to send them outside the request

   bringert: what's the difference between set-param and info

   Milan: set-param is for standard headers

   mbodell: could I use set-params for changing timeouts

   robert: if you set something in set-params, that's the default value
   for the next request

   Milan: MRCP doesn't specify the semantics of set-param

   robert: set-params, get-params introduce confusion, let's skip them

   Milan: how does that handle mbodell's timeout example

   robert: to set a parameter halfway through, do what SIP does, send
   the invite again
   ... just resend the request

   Milan: wouldn't that stop the previous request

   robert: wouldn't have to
   ... we could either define what set-params does, or get rid of it

   Milan: is there anything that you can't retract if you restart the
   request without restarting
   ... implementations should be allowed to refuse

   robert: you can keep streaming audio while stopping and starting a
   new one on the same audio stream
   ... there will be some latency

   Milan: it's pretty hard to change the grammar while recognizing

   robert: yeah, we don't do that either

   Milan: should we propose to remove set-params?
   ... is there a way to change parameters in the API?

   mbodell: only by a new init()

   robert: set-params while running in MRCP is weird

   Milan: you can set session default parameters

   robert: yeah, but that's not useful

   Milan: how do you get info back from the recognizer?
   ... you can initiate that feedback from the client without
   get-params

   robert: in section 4.3 if you want to discover whether the
   recognizer supports some languages and grammars
   ... you use get-params

   Milan: so we need get-params
   ... that's why I wanted to model it as attributes

   robert: it's not an attribute if you expect it to do something right
   away

   bringert: how do you use get-params to check whether some languages
   are supported?

   robert: use a header with supported-languages, response will include
   the supported subset

   Milan: no equivalent in the API

   robert: two stage thing, first connects to server, does get-param
   ... if that doesn't return the right results, APi could just fail

   mbodell: in the API init() would do this under the hood

   <robert> here's how you use the protocol to discover supported
   languages, etc:
   [11]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
   /att-0012/speech-protocol-draft-05.htm#capabilities-discovery

     [11] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#capabilities-discovery

   Milan: that specialized to session start-up
   ... what if we want to get info during the request

   mbodell: that should come in the results

   bringert: in the EMMA

   Milan: so the assumption is that you tell recognizer about
   everything up front

   mbodell: yes

   Milan: maybe that's ok
   ... what if the user is talking and then presses button to get their
   account balance

   mbodell: that would have been returned in EMMA

   Milan: I'm tentatively ok with keeping get-params in protocol, and
   not adding anything to web API to drive get-params explicitly

   AGREEMENT: keep get-params
   ... don't add anything for get-params to web API

   Michael: ok to remove set-params as long as INFO is plumbed into the
   web API

   AGREEMENT: remove set-params from protocol
   ... add method web API to send INFO to recognizer

   mbodell: can you have more than 1 synthesis engine

   robert: no, at most one tts engine
   ... but you can have more than one tts request
   ... can request TTS of any number of SSML docs in parallel
   ... up to engine to serialize them if needed
   ... up to web API to decide when to playback

   mbodell: I just wanted to make sure that the discussion matches the
   doc

   robert: the topic came up, some synthesizers want to generate events
   to do lip sync

   <robert> viseme

   robert: called viseme events
   ... how does an app say that they want this event

   mbodell: vendor specific header?

   robert: yes, but how do we handle vendor-specific events

   Milan: I thought we had that already

   bringert: I don't think so

   Milan: but we have it in recognition?

   mbodell: not sure

   robert: some recognizer could want to give you additional
   information

   <robert> viseme example in the protocol spec:
   [12]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
   /att-0012/speech-protocol-draft-05.htm#interim-events

     [12] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#interim-events

   bringert: that could be in the EMMA

   Milan: well, it might not be a final result

   robert: example in the protocol spec:
   [13]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
   /att-0012/speech-protocol-draft-05.htm#interim-events
   ... of visemes
   ... there are a lot of events that various TTS engines send
   ... phoneme events, viseme events etc
   ... would follow the same API as mark events

     [13] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#interim-events

   Milan: except they're not standard

   mbodell: what section of protocol spec is this?

   Milan: 4.4
   ... I figured there would be something in the API firing events when
   this happens

   bringert: so this is interim-event in the protocol

   Milan: yes

   mbodell: could either throw a fixed event, or using the name in the
   interim-event

   Milan: how do you correlate and event to the engine?

   bringert: the event is fired at the request object

   Milan: how are content types handled

   bringert: the event object would have an attribute

   mbodell: e.g. attributes data and mimeType
   ... would any events be binary?

   Milan: they are probably all text

   bringert: looks like the protocol supports any MIME type
   ... is there some API for blob + type

   smaug: not sure

   bringert: how about a general object + type?

   Milan: makes sense

   <smaug> [14]http://www.w3.org/TR/2009/WD-FileAPI-20091117/#dfn-Blob

     [14] http://www.w3.org/TR/2009/WD-FileAPI-20091117/#dfn-Blob

   <smaug> no type

   mbodell: I think we should use the interim-event-name

   bringert: could clash with existing DOM events
   ... so the event name in JS would be something like
   "speech-x-viseme-event"

   smaug: should have vendor prefix too
   ... there could be a standard event, with an additional field for
   the event name

   bringert: pretty quivalent

   smaug: if the event name is non-standard, you can only use
   addEventListener()

   mbodell: being able to register separate listeners is powerful

   bringert: objections to firing "speech-X" events for an
   interim-event with name X?

   smaug: slight preference for a standard event

   AGREEMENT: web API will fire "speech-X" events for any interim-event
   with name X

   Milan: how will we identify parameters as vendor-specific or
   standard?

   mbodell: API already has that, DOM attributes for standard
   parameters, and a catch-all method for setting additional parameters

   Milan: are you preventing newlines in those values?

   mbodell: no, just DOMString

   bringert: we can have restrictions on that

   mbodell: or throw an error

   bringert: lots of that in HTML already

   Milan: we need to sync the attribute list from the web API with the
   standard headers in the protocol

   robert: I'm planning to go through both specs looking for
   discrepancies

   mbodell: there can be thing in the protocol that aren't in the API

   DanD: how do we deal with security in the protocol?
   ... could be a DoS vulnerability

   <Milan> Milan: If there are parameters in the protocol that don't
   exist in the API, we should discuss those exceptions

   DanD: how we prevent setting or getting parameters in a non-proper
   way

   member: Milan: If there are parameters in the protocol that don't
   exist in the API, we should discuss those exceptions

   eh

   DanD: read a web page, open lots of sessions from a different web
   site accessing the same service
   ... doing lots of activity

   robert: that's a general problem with web services
   ... the service provider needs DoS protection

   DanD: will we have an API key registration

   robert: the problem with API keys is that they don't work very well
   ... could have a list of tokens that can't be reused
   ... can't standardize security at this point

   bringert: sending API key in vendor-specific header

   robert: other interesting problems
   ... faffing about
   ... only solutions are proprietary
   ... there is a section somewhere

   <robert> here are some notes on security in the protocol document:
   [15]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep
   /att-0012/speech-protocol-draft-05.htm#protocol-security

     [15] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#protocol-security

   Michael_Johnston: this is a general problem for WebSockets
   ... this is not the only way to cause a server to do heavy lifting

   DanD: we need to capture this in the security section and just say
   that vendor-specific key-value pairs can be used

   mbodell: question about grammar weights, section 5.3 in protocol doc
   ... only allow grammar weights that start with 0.
   ... in SRGS you can have weights that add up to more than 1

   robert: that would work

   Michael: in VoiceXML too
   ... not specifying a weight means that weight is 1

   mbodell: in generall, we just say what the default is
   ... and that smaller means less likely and larger means more likely

   AGREEMENT: allow any positive number as grammar weight

   Michael: if you specify a weight on a ruleref, you weight the rule,
   you don't scale the weights in the grammar
   ... if you have three grammars, with internal weights
   ... let's say two grammars
   ... I want to put weights 0.25 and 0.75 on them
   ... if they were combined to a single model, you would scale all
   weights within them
   ... other interpretation is that it is a static weight on the path

   bringert: aren't those equivalent

   Michael: don't think so

   bringert: multiple grammars should be like having a single top-level
   grammar with weighted rulerefs for each grammar

   Michael: should be also have a way to set scaling

   mbodell: isn't that they same?
   ... details are recognizer-specific

   bringert: hard to standardize this kind of maths

   Michael: there is a difference between adding a static cost and
   interpolating two SLMs

   robert: one other note, Dan Burnett sent an email this morning that
   he can't make it

Received on Tuesday, 11 October 2011 22:47:22 UTC