[dialog] McGee - VBWG official response to VoiceXML 2.0 Last Call Review Issues

The Voice Browser Working Group (VBWG) has almost
finished resolving the issues raised during the last call
review of the 24 April 2002 VoiceXML 2.0 [1]. Our apologies that 
it has taken so long to respond.

This is the VBWG's formal response to the issues you raised,
which have been logged in the Working Group's issues list [4].
The VBWG's resolutions have been incorporated into the 18 October
2002 draft of the VoiceXML 2.0 [5]. 

Please indicate before 1st November 2002 whether you are satisfied with
the VBWG's resolutions, whether you think there has been a
misunderstanding, or whether you wish to register an objection.
If you do not think you can respond before 1st November, please let me
know. The Director will appreciate a response whether you agree
with the resolutions or not.

Below you will find:

 1) More information follows about the process we are following.
 2) A summary of the VBWG's responses to each of your issues.

Thank you,

Scott
Co-Chair, VBWG

-----------------------------------------------
1) Process requirement to address last call issues
-----------------------------------------------

Per section 5.2.3 [2] of the 19th July 2001 Process Document, in
order for the VoiceXML 2.0 to advance to the next state (Candidate
Recommendation), the Working Group must "formally address all
issues raised during the Last Call review period (possibly
modifying the technical report)." Section 4.1.2 of the Process
Document [3] sets expectations about what constitutes a formal
response:

  "In the context of this document, a Working Group has formally
  addressed an issue when the Chair can show (archived) evidence
  of having sent a response to the party who raised the
  issue. This response should include the Working Group's
  resolution and should ask the party who raised the issue to
  reply with an indication of whether the resolution reverses the
  initial objection."

If you feel that the response is based on a misunderstanding of
the original issue, you are encouraged to restate and clarify the
issue until there is agreement about the issue, so that the
Working Group may prepare its substantive response.

If the response shows understanding of the original issue but
does not satisfy the reviewer, you may register a formal
objection with the Working Group that will be carried forward
with the relevant deliverables. 

[1] http://www.w3.org/TR/2002/WD-voicexml20-20020424/
[2] http://www.w3.org/Consortium/Process-20010719/tr.html#RecsCR
[3] http://www.w3.org/Consortium/Process-20010719/groups.html#WGVotes
[4] http://www.w3.org/Voice/Group/2002/voiceXML-change-requests.htm
(members only)
[5] http://www.w3.org/Voice/Group/2002/WD-voicexml20-20021018.htm
(members only)
(http://www.w3.org/Voice/Group/2002/WD-voicexml20-20021018.zip) (members
only)

-----------------------------------------------
2) Issues you raised and responses
-----------------------------------------------
In http://lists.w3.org/Archives/Public/www-voice/2002AprJun/0058.html
you raised 
the following issues which were registered as dialog change requests
R467 and R468. Our response is given inline after each issue.

Problem #1. <initial> in mixed-initiative forms. 
------------------------------------------------ 
Section 2.1.5 specifies (the 2nd sentence): 
" To make a form mixed initiative, where both the computer and the human

direct the conversation, it must have one or more <initial> form items
and one or more form-level grammars. " 

That implies that <initial> is a required element of the
mixed-initiative form. The examples use <initial>, too. 
However, FIA does not seem to have any provisions to enforce it. 

Our questions: 
- Is <initial> required for a form to be mixed-initiative ? 
- Or, does a form-level grammar alone imply mixed-initiative? 

If <initial> is not required, then there seem to be no benefit in
defining directed and mixed-initiative forms (a VoiceXML language
structure). 
Instead, the directed and mixed-initiative behaviors should be discussed
in terms of item modality and grammar types and scoping (VoiceXML use
cases). 
For example, the following language could be used: 
- A 'directed dialog' can be implemented by using form item-level
grammars rather than form-level grammars. If it is desired to restrict
user options to just the item's grammar, the form item should be made
modal. Otherwise, grammars in wider scopes may still accept user
utterances (eg. links with 'restart', 'new order', etc.) and restart
interpretation at a different 
form. 
- A 'mixed-initiative dialog' can be implemented by using form-level 
grammars that may return multiple slots and thus allow multiple form
items to be filled from a single caller utterance. The <initial> form
item can be 
used in this scenario to prompt for and collect an utterance before 
executing any input items of the form (which may have their own
specialized grammars and may potentially capture the recognition results
as their own input). 


Otherwise, if <initial> is required for a form to be mixed-initiative, a

form without <initial> would be a directed form regardless of the
presence of a form-level grammar. In such case, any utterances would be
processed in 
the context of individual input items rather than in the form context.
The form items will be filled one at a time. 

VBWG Response: Accepted. 

In the latest draft [5], we have clarified that (a) mixed initiative is
a style of dialog (not a form sub-type), and (b) that <initial> isn't
necessary for mixed-initiative dialog but one way of doing it. In
particular, the first paragraph of 2.1.5 now reads: "The last section
talked about forms implementing rigid, computer-directed conversations.
To make a form mixed initiative, where both the computer and the human
direct the conversation, it must have one or more form-level grammars.
The dialog may be written in several ways. One common authoring style
combines an <initial> element that prompts for a general response with
<field> elements that prompt for specific information. This is
illustrated in the example below. More complex techniques, such as using
the 'cond' attribute on <field> elements, may achieve a similar effect.


Problem #2. Mapping results from a form-level grammar. 
------------------------------------------------------- 
Let's consider interpretation of a VoiceXML document where: 
- there is a form with multiple fields, 
- the form has a form-level grammar that can return multiple slots, 
- the fields do not have their own grammars, 
- it is a mixed-initiative form (see also the problem #1 above), 
- the first recognition result fills some fields, but not all of them, 
- another caller utterance is needed to fill the remaining fields. 

Our questions: 
[a] - Is it expected that the form will switch to the 'directed dialog'
mode 
after the first utterance and then consider only unfilled items for the 
subsequent utterances (see also problem #1 above) ? 
[b] - Or, will the form remain in the 'mixed-initiative dialog' mode and
will user utterances continue to be mapped to multiple input fields (as
the 2nd table in section 3.1.6.3 seems to imply) ? 
[c] - And, if the form is to remain in the 'mixed-initiative dialog'
mode, can the next user utterance overwrite fields that have been
already filled or will those fields retain their previous values ? 

To illustrate the problem, let's assume that: 
- the fields are 'size', 'color', and 'shape', 
- the first utterance is 'big square', 
- the second prompt says 'Please provide the color', 
- the second utterance is 'blue triangle'. 
Will the completed form be 'big blue square' or 'big blue triangle' ? 
The 2nd table in section 3.1.6.3 should be updated to cover all
(canonical) combinations of user input and dynamic states of form
components. 

VBWG Response: [a] and [b] don't require spec changes; Accepted [c]. 

See response to Problem #1 for clarification of terms 'directed dialog'
and 'mixed initiative dialog'. 

[a]/[b]: Given you only have a form-level grammar in your example, it is
the only grammar that can be matched by user input. When the FIA visits
the form, it will go to the prompts in an <initial> if present, read out
the prompts in <initial> and activate the form-level grammar. If there
is no <initial>, it will go to the first field and do the same thing
there. After the first recognition fills in some but not all fields, the
<initial> can no longer be visited, and the FIA will go to the next
unfilled field, queuing its prompts and again activate the form-evel
grammar (there are no other grammars in your example!). This will
continue until all fields are filled. 

[c]  We have clarified in 3.1.6.1 of [5] that matching form-level
grammars can override existing values in input items and that <filled>
processing of these items takes place as described Section 2.4 and
Appendix C 

Received on Tuesday, 22 October 2002 07:19:14 UTC