- From: Yves Normandin <norman@locus.ca>
- Date: Sat, 15 Jan 2000 00:57:26 -0500 (EST)
- To: <www-voice@w3.org>
- Cc: "Yves Normandin" <yves.normandin@locus.ca>
- Message-ID: <000901bf5f1d$ef6de430$0e9c60cf@locus>
We reviewed the Voice Browser Requirements specs with great interest. In general, our appreciation of the contents is definitely favorable and most of our comments are therefore relatively minor. Following is a summary of preliminary comments or issues that we have. Dialog Requirements for Voice Markup Languages a.. Fields that are neither optional nor mandatory (see 3.5). The notion of a form with some parameters that it knows how to gather autonomously before returning to the server is a good one. Typically, there will be some mandatory fields and some optional fields, with the optional fields being initially filled in with default values. However, there does not seem to be a way of specifying fields that are neither mandatory nor optional because they are interdependent. For example, people might request a room by its name, size, or location. We want them to specify at least one of those before we make a query, but they do not need to specify them all. Which one they specify is up to the user. Thus, "a room in Building X" and "a small room" are both sufficient to trigger a query. These fields are not completely optional because at least one of them must be provided, but they are also not mandatory, since all of them need not be specified. b.. State. The definition of a state seems to imply that states are explicitly defined (that is, with no variables in its specification). This is theoretically fine, and is appropriate for simple applications, but can have practical problems for larger applications. A more convenient approach for larger applications is to use the concept of state variables and rules. In this case, a state is not explicitly defined, but consists of the contents of the variables in the dialogue. Rules define the transitions from one state to another, by triggering on a subset of the values of variables, doing some processing when it is triggered, and changing the contents of some of the state variables, resulting in a new state. It would be nice if the capability to model applications in this manner could be built into the language (assuming that simple applications can still be specified simply). c.. Style sheets. It would be interesting to explore the concept of style sheets to modify the default behavior of the voice browser. d.. Confirmation Subdialog (2.1.2). Should the markup language be able to specify that the confirmation be implicit? For instance: U1: I want to fly to Paris. S1: You want to go from Paris to where? e.. Suspended tasks (2.2.5). Should the markup languages allow links to be specified between certain fields of different forms so that when there is a task switch, certain fields in the new task could be set using values from the previous task? Similarly, if these fields are modified in the new task, the markup language could allow the fields in the original task to be modified accordingly. f.. Modularity and Re-use (3.3). Is there any plan to develop a high level (vendor independent) API to the speech and telephony resources to be used by these dialog components so that they could be platform-independent (e.g., like an applet Java running within a browser)? g.. Call Transfer (2.8). In general we found the requirements somewhat light on telephony features. This may be a deficiency considering that the Working Group effort concentrates (as it should) on using the telephone as the first voice browsing device. Grammar Representation Requirements for Voice Markup Languages a.. Semantics Support (4.1). What's the difference between this point and the following (4.2)? b.. N-Best Hypotheses (5.2). What kind of information in the grammar representation could be used to support the post-processing of N-best recognition hypotheses? c.. Native Natural Languages (8.5). The grammar representation should support the specification that a given word can be pronounced in more than one language. For instance, a Spanish name (person, company, street) in an English sentence could be pronounced either with a perfect Spanish pronunciation or simply in English by "Americanizing" the name. In fact, the grammar representation should allow native languages to be specified at various levels within the grammar (e.g., grammar level, rule level, word level, etc.). Model Architecture for Voice Browser Systems In general, we haven't found this architecture model very useful. I would suspect this to be true for many readers. Regards, Yves Normandin Founder and Chief Technology Officer Locus Dialogue 460 Ste-Catherine St. West, Suite 800 Montreal, Quebec H3B 1A7 Canada Phone: (514) 954-3804 Fax: (514) 954-3805 www.locusdialogue.com
Received on Monday, 17 January 2000 07:23:10 UTC