[v3] Feb 15th, Notes for 2nd hour from Brad Porter on 2005-02-15 (www-voice@w3.org from January to March 2005)

From: Brad Porter <brad@tellme.com>
Date: Tue, 15 Feb 2005 09:03:09 -0800
To: W3C Voice Browser Working Group <www-voice@w3.org>
Message-ID: <42122B4D.8090405@tellme.com>
Meeting with TAG for lunch on Thursday of F2F.  Topic will be HST. Who 
would like to attend?

- Raman
- Scott
- Jim Barnett
- Brad
- Paolo
- Bodell
- Dave Raggett
- Claus
- Tim
- Brian
- Alex Lee
- RJ Auburn

Jim: Propose we arrange lunch in the room
TV: How much do people know about this meeting?
Jim: Right now, nothing.  Hoping Jim Barnett will provide an agenda.
TV: Is that appropriate for this meeting or do we want to start at a 
higher level.
Jim: would like Jim to put together a 10 minute presentation.
JimB: I'll just give the high-level view at the white-board.
TV: One of the ways to show the value is to demonstrate we're slicing 
out a commonality and show we're putting a formalism to it.


Scott: At the face to face we want to cover all the topics on the 
deferred list.  May wish to change the timing in the agenda. 
Jim: The purpose of the agenda is just to let people know what the 
topics will be.
Scott: Do we want a planning session in the VoiceXML section or do it 
all in the planning day?
Jim: I don't have a preference.
Scott: I propose we take the last hour of that day to do the V3 planning 
for what concrete proposals we need. 

Scott: Today we have a number of CRS to discuss:  audio controls, 
grammar issues, recording CRs.  Still have a few telephony ones to 
cover, but RJ can't make it today.  We'll finalize the discussion on 
those at the F2F.  Starting with the first... audio controls.  In Jim's 
email there is a link to the email I sent listing the audio control 
CRs.  There are about 6 CRs.

CR 64 - Ability to indicate the beginning and end of the audio 
resource.  Specify a start time and a stop time.  Similar requirements 
in SMIL and the daisy framework.

CR 154 - Asking or sticky volume and speed for TTS.  Specify default TTS 
value for speed and volume to remain enforced for the document or the 
application.  May be reasonable for a session-level feature.  Need to 
figure out how it interacts with SSML.

Dan: every time we discuss this we ask if it should be applied to all 
audio or to the TTS.

CR ??? - Jump forward/back.

JimB: Does this require fully asynchronous events?
Mike: If you have the <mark/> tag you can jump back and forth.
Dave: Challenges are what are the features and then how do they bind to 
the UI?

CR 605 - General request for providing more client-side audio control.  
Where is the offset of the mark?  Scope of what is being controlled may 
need control beyond the audio file.  Points out some browsers have 
extensions already.

CR 608 - Control the speed for audio through client-side controls.

CR 613 - Specific proposal for pause/resume forward/backward volume... 
has a model in the proposal.

Scott: Proposal -- accept those as the basis for requirements.  Since we 
have a specific proposal; take that as a straw-man and begin to work out 
how we would want that to work.
Brad: Do any of the CRs cover audio layering?  Mixing audio in parallel.
Scott: No, hasn't made it into a CR.
Dan: SMIL can do that right?
Scott: Yes
Dan: Just mention that as I don't know if we've really evaluated how 
SMIL might work on this.
Scott: Should we solicit proposals for the CRs?
Brad: I think so.
Jim: Are we soliciting from the WG or the public?
Scott: WG
Paolo: (ACTION) I will work on submitting these CRs.
Scott: Thanks Paolo

Scott: There are some issues that arise in this area.  What is the scope 
of your seek, audio, prompts, entire queue?  Most proprietary 
implementations seem to be at the prompt level.  Most app developers 
seem to be looking at the prompt queue.
Emily: Thinking about the prompt queue aligns well with what we've done 
with <mark/> in 2.1.

Scott: The other issue is how you do the media control itself.  There 
are generally three -- synchronous event model (stop the prompt queue, 
apply an action); second model is to move toward a more asynchronous 
event model (events could be passed up without stopping and manipulate 
audio directly); third model is to use VoiceXML to define mappings 
between key bindings and audio and pass to runtime processor, 
lower-level subsystem manages all interactions until some unhandled 
interaction happens. 
JimB: The third, runtime controls, might be far more efficient.
Scott: Should make authoring much simpler.  Don't need to manage queue.
Mike:  I think async events is a bad idea.  1 or 3 seems much better.  3 
would be hard because SSML interpreters must become asynchronous.
Emily: Decision depends on the actual behavior.  Pause/resume may have 
better latency  Which you choose depends on the function?
Mike: Even with pause/resume pushing it up shouldn't be latent.
Brad: Do we have enough use cases to help evaluate these?
Scott: yes, that was my thought on the next steps.
JimA(?): Voicemail has many of the use cases.
Scott: My proposal is we activate audio control as a CR area for 
VoiceXML 3.0 and begin to flesh out requirements and use cases for this 
area in greater detail.
(General consent)

Grammar CRs

Paolo: You can review my email where we go CR by CR.

CR 6 - More fine-grained control.  Want mixed-initiative to fill a 
form.  Separate function is to do navigational <goto/>.  Suggesting a 
separation between the two. 

Paolo: I can see the problem because there is no way to turn on or off 
the grammars.  Question is do we have use cases?
Dan: One notion that has come up in the past was around finer 
selectivity of grammars in general.  May want to make this an instance 
of that.
Mike: May want a cond= attribute on grammar.
Dan: Would you envision this on every grammar?
Mike: I would envision it on every grammar.
Brad: There is a bigger issue about separation between navigation and 
information collection being overloaded in a certain form.
Scott: Based on this discussion I would propose we activate this.  We 
can look at cond= and some of the higher-level navigation issues.
Paolo: Conclusion is to activate and discuss.

CR 13 - utterance vs. semantic slot confidence
Paolo: couldn't find any language that covered this in VoiceXML 2.0
Scott: I was sure we put language in this.
Paolo: I don't see the confidence on the interpretation.  EMMA can 
annotate everything, but in the current VoiceXML 2.0 description I can't 
see how you do it.  I think this would be interesting to consider in 
VoiceXML 3.0 to allow better searching of results and mixed-initiative.
Mike: My memory is similar to Scott's.
Scott:  Still looking for it.  Dan, didn't you work on this?
Dan: Yes, looks like my CR; trying to understand it.  I believe the spec 
was CR was processed to meet the basic requirements; would suggest 
looking at this CR in the broader context of refactoring processing of 
reco results.
Scott: You would propose waiting for more specific requirements?
Dan: Yes, we would look at this if we were reopening confidence and 
other capabilities.
Paolo: Aligning with EMMA might be the right thing to take on in V3.  
Addressing EMMA bindings will address this CR.
(general discussion around EMMA bindings)
Scott: CR 119 covers that right?
Dan: Propose closing this; may reopen if some request for more results 
information is made.

CR 15 - Multiple semantic interpretation results
Paolo: We have nbest; but not mulitple semantic interpretations per result.
Scott: We had this by having multiple nbest requests with the same 
utterance string with different semantic interpretations.
Dan: From my perspective this addresses it.
Paolo: If no one needs this intermediate level, we do not need to reopen 
this.
Dan: Process issue is that there is no Nuance rep; should we email?
Scott: As a process we send change to status notifications to the 
representative of that CR.
Dan: I can give you a name for a Nuance rep.

CR 110 -- expr attribute on <grammar/>
Paolo: Addressed by 2.1

CR 111 -- Support a <value/> inside and inline grammar.
Paolo:  Don't believe this is addressed.
Mike: <value/> in <grammar/> would be supportable as a preprocess 
similar to SSML
Mike: Has issues in caching of grammars.
Dan: i think this one is worth discussing; not appropriate to reject 
outright.  could defer for SRGS discussion; should be done in VoiceXML 
arena.
Scott: we need to decide if this is useful for app development. 

Scott: Propose we continue these discussions at the f2f.
Received on Tuesday, 15 February 2005 17:03:38 UTC