- From: Brad Porter <brad@tellme.com>
- Date: Tue, 15 Feb 2005 09:03:09 -0800
- To: W3C Voice Browser Working Group <www-voice@w3.org>
Meeting with TAG for lunch on Thursday of F2F. Topic will be HST. Who would like to attend? - Raman - Scott - Jim Barnett - Brad - Paolo - Bodell - Dave Raggett - Claus - Tim - Brian - Alex Lee - RJ Auburn Jim: Propose we arrange lunch in the room TV: How much do people know about this meeting? Jim: Right now, nothing. Hoping Jim Barnett will provide an agenda. TV: Is that appropriate for this meeting or do we want to start at a higher level. Jim: would like Jim to put together a 10 minute presentation. JimB: I'll just give the high-level view at the white-board. TV: One of the ways to show the value is to demonstrate we're slicing out a commonality and show we're putting a formalism to it. Scott: At the face to face we want to cover all the topics on the deferred list. May wish to change the timing in the agenda. Jim: The purpose of the agenda is just to let people know what the topics will be. Scott: Do we want a planning session in the VoiceXML section or do it all in the planning day? Jim: I don't have a preference. Scott: I propose we take the last hour of that day to do the V3 planning for what concrete proposals we need. Scott: Today we have a number of CRS to discuss: audio controls, grammar issues, recording CRs. Still have a few telephony ones to cover, but RJ can't make it today. We'll finalize the discussion on those at the F2F. Starting with the first... audio controls. In Jim's email there is a link to the email I sent listing the audio control CRs. There are about 6 CRs. CR 64 - Ability to indicate the beginning and end of the audio resource. Specify a start time and a stop time. Similar requirements in SMIL and the daisy framework. CR 154 - Asking or sticky volume and speed for TTS. Specify default TTS value for speed and volume to remain enforced for the document or the application. May be reasonable for a session-level feature. Need to figure out how it interacts with SSML. Dan: every time we discuss this we ask if it should be applied to all audio or to the TTS. CR ??? - Jump forward/back. JimB: Does this require fully asynchronous events? Mike: If you have the <mark/> tag you can jump back and forth. Dave: Challenges are what are the features and then how do they bind to the UI? CR 605 - General request for providing more client-side audio control. Where is the offset of the mark? Scope of what is being controlled may need control beyond the audio file. Points out some browsers have extensions already. CR 608 - Control the speed for audio through client-side controls. CR 613 - Specific proposal for pause/resume forward/backward volume... has a model in the proposal. Scott: Proposal -- accept those as the basis for requirements. Since we have a specific proposal; take that as a straw-man and begin to work out how we would want that to work. Brad: Do any of the CRs cover audio layering? Mixing audio in parallel. Scott: No, hasn't made it into a CR. Dan: SMIL can do that right? Scott: Yes Dan: Just mention that as I don't know if we've really evaluated how SMIL might work on this. Scott: Should we solicit proposals for the CRs? Brad: I think so. Jim: Are we soliciting from the WG or the public? Scott: WG Paolo: (ACTION) I will work on submitting these CRs. Scott: Thanks Paolo Scott: There are some issues that arise in this area. What is the scope of your seek, audio, prompts, entire queue? Most proprietary implementations seem to be at the prompt level. Most app developers seem to be looking at the prompt queue. Emily: Thinking about the prompt queue aligns well with what we've done with <mark/> in 2.1. Scott: The other issue is how you do the media control itself. There are generally three -- synchronous event model (stop the prompt queue, apply an action); second model is to move toward a more asynchronous event model (events could be passed up without stopping and manipulate audio directly); third model is to use VoiceXML to define mappings between key bindings and audio and pass to runtime processor, lower-level subsystem manages all interactions until some unhandled interaction happens. JimB: The third, runtime controls, might be far more efficient. Scott: Should make authoring much simpler. Don't need to manage queue. Mike: I think async events is a bad idea. 1 or 3 seems much better. 3 would be hard because SSML interpreters must become asynchronous. Emily: Decision depends on the actual behavior. Pause/resume may have better latency Which you choose depends on the function? Mike: Even with pause/resume pushing it up shouldn't be latent. Brad: Do we have enough use cases to help evaluate these? Scott: yes, that was my thought on the next steps. JimA(?): Voicemail has many of the use cases. Scott: My proposal is we activate audio control as a CR area for VoiceXML 3.0 and begin to flesh out requirements and use cases for this area in greater detail. (General consent) Grammar CRs Paolo: You can review my email where we go CR by CR. CR 6 - More fine-grained control. Want mixed-initiative to fill a form. Separate function is to do navigational <goto/>. Suggesting a separation between the two. Paolo: I can see the problem because there is no way to turn on or off the grammars. Question is do we have use cases? Dan: One notion that has come up in the past was around finer selectivity of grammars in general. May want to make this an instance of that. Mike: May want a cond= attribute on grammar. Dan: Would you envision this on every grammar? Mike: I would envision it on every grammar. Brad: There is a bigger issue about separation between navigation and information collection being overloaded in a certain form. Scott: Based on this discussion I would propose we activate this. We can look at cond= and some of the higher-level navigation issues. Paolo: Conclusion is to activate and discuss. CR 13 - utterance vs. semantic slot confidence Paolo: couldn't find any language that covered this in VoiceXML 2.0 Scott: I was sure we put language in this. Paolo: I don't see the confidence on the interpretation. EMMA can annotate everything, but in the current VoiceXML 2.0 description I can't see how you do it. I think this would be interesting to consider in VoiceXML 3.0 to allow better searching of results and mixed-initiative. Mike: My memory is similar to Scott's. Scott: Still looking for it. Dan, didn't you work on this? Dan: Yes, looks like my CR; trying to understand it. I believe the spec was CR was processed to meet the basic requirements; would suggest looking at this CR in the broader context of refactoring processing of reco results. Scott: You would propose waiting for more specific requirements? Dan: Yes, we would look at this if we were reopening confidence and other capabilities. Paolo: Aligning with EMMA might be the right thing to take on in V3. Addressing EMMA bindings will address this CR. (general discussion around EMMA bindings) Scott: CR 119 covers that right? Dan: Propose closing this; may reopen if some request for more results information is made. CR 15 - Multiple semantic interpretation results Paolo: We have nbest; but not mulitple semantic interpretations per result. Scott: We had this by having multiple nbest requests with the same utterance string with different semantic interpretations. Dan: From my perspective this addresses it. Paolo: If no one needs this intermediate level, we do not need to reopen this. Dan: Process issue is that there is no Nuance rep; should we email? Scott: As a process we send change to status notifications to the representative of that CR. Dan: I can give you a name for a Nuance rep. CR 110 -- expr attribute on <grammar/> Paolo: Addressed by 2.1 CR 111 -- Support a <value/> inside and inline grammar. Paolo: Don't believe this is addressed. Mike: <value/> in <grammar/> would be supportable as a preprocess similar to SSML Mike: Has issues in caching of grammars. Dan: i think this one is worth discussing; not appropriate to reject outright. could defer for SRGS discussion; should be done in VoiceXML arena. Scott: we need to decide if this is useful for app development. Scott: Propose we continue these discussions at the f2f.
Received on Tuesday, 15 February 2005 17:03:38 UTC