Comments on the Voice Browser Requirements specs

We reviewed the Voice Browser Requirements specs with great interest. In
general, our appreciation of the contents is definitely favorable and most
of our comments are therefore relatively minor. Following is a summary of
preliminary comments or issues that we have.

Dialog Requirements for Voice Markup Languages

  a.. Fields that are neither optional nor mandatory (see 3.5). The notion
of a form with some parameters that it knows how to gather autonomously
before returning to the server is a good one. Typically, there will be some
mandatory fields and some optional fields, with the optional fields being
initially filled in with default values. However, there does not seem to be
a way of specifying fields that are neither mandatory nor optional because
they are interdependent. For example, people might request a room by its
name, size, or location. We want them to specify at least one of those
before we make a query, but they do not need to specify them all. Which one
they specify is up to the user. Thus, "a room in Building X" and "a small
room" are both sufficient to trigger a query. These fields are not
completely optional because at least one of them must be provided, but they
are also not mandatory, since all of them need not be specified.
  b.. State. The definition of a state seems to imply that states are
explicitly defined (that is, with no variables in its specification). This
is theoretically fine, and is appropriate for simple applications, but can
have practical problems for larger applications. A more convenient approach
for larger applications is to use the concept of state variables and rules.
In this case, a state is not explicitly defined, but consists of the
contents of the variables in the dialogue. Rules define the transitions from
one state to another, by triggering on a subset of the values of variables,
doing some processing when it is triggered, and changing the contents of
some of the state variables, resulting in a new state. It would be nice if
the capability to model applications in this manner could be built into the
language (assuming that simple applications can still be specified simply).
  c.. Style sheets. It would be interesting to explore the concept of style
sheets to modify the default behavior of the voice browser.
  d.. Confirmation Subdialog (2.1.2). Should the markup language be able to
specify that the confirmation be implicit? For instance:
  U1: I want to fly to Paris.

  S1: You want to go from Paris to where?

  e.. Suspended tasks (2.2.5). Should the markup languages allow links to be
specified between certain fields of different forms so that when there is a
task switch, certain fields in the new task could be set using values from
the previous task? Similarly, if these fields are modified in the new task,
the markup language could allow the fields in the original task to be
modified accordingly.
  f.. Modularity and Re-use (3.3). Is there any plan to develop a high level
(vendor independent) API to the speech and telephony resources to be used by
these dialog components so that they could be platform-independent (e.g.,
like an applet Java running within a browser)?
  g.. Call Transfer (2.8). In general we found the requirements somewhat
light on telephony features. This may be a deficiency considering that the
Working Group effort concentrates (as it should) on using the telephone as
the first voice browsing device.
Grammar Representation Requirements for Voice Markup Languages

  a.. Semantics Support (4.1). What's the difference between this point and
the following (4.2)?
  b.. N-Best Hypotheses (5.2). What kind of information in the grammar
representation could be used to support the post-processing of N-best
recognition hypotheses?
  c.. Native Natural Languages (8.5). The grammar representation should
support the specification that a given word can be pronounced in more than
one language. For instance, a Spanish name (person, company, street) in an
English sentence could be pronounced either with a perfect Spanish
pronunciation or simply in English by "Americanizing" the name. In fact, the
grammar representation should allow native languages to be specified at
various levels within the grammar (e.g., grammar level, rule level, word
level, etc.).
Model Architecture for Voice Browser Systems

In general, we haven't found this architecture model very useful. I would
suspect this to be true for many readers.

Regards,

Yves Normandin
Founder and Chief Technology Officer
Locus Dialogue
460 Ste-Catherine St. West, Suite 800
Montreal, Quebec H3B 1A7 Canada

Phone: (514) 954-3804
Fax:   (514) 954-3805
www.locusdialogue.com

Received on Monday, 17 January 2000 07:23:10 UTC