- From: Daniel C. Burnett <burnett@nuance.com>
- Date: Fri, 18 Aug 2000 12:26:16 -0700
- To: alang@mitre.org
- Cc: www-voice@w3.org, "Larson, Jim A" <jim.a.larson@intel.com>
I have a clarification question: under General Comments, the term "Dialog" refers to the Reusable Dialog Components Requirements Document, right? -- dan Alan Goldschen wrote: > Hello Jim and W3C Voice Browsers team: > > Here are the comments gathered from members of the DARPA MITRE > Communicator team. I have placed each comment for each document > together - and have not edited any comments. (I have attached the Word > Document to this e-mail). > > Respectively submitted, > > Alan Goldschen for the MITRE DARPA Communicator team. > > General Comments > > · I read the Grammar, Dialog and Multimodal Requirements docs. > Generally, I found the Grammar and Multimodal documents to be written > much better than the Dialogue document, with many helpful examples. I > just couldn't get anything informative out of the Dialog document. It > was sketchy and had virtually no examples to illustrate the various > requirements". > > Speech Recognition Grammar Specification > > · It wasn't clear to me what the relationship was between the SR markup > language being proposed and some of the APIs out there for SR, like > JSAPI. For instance, the voice seems to be controlled in the data string > at the moment, while the APIs seem to support a separate call to change > the voice. This contrast should be recognized and reconciled. > > · I didn't see anything in the Grammar document about word order. Maybe > this is implicit in the definitions, but as I see it either the rule > elements are ordered, in which case there need to be multiple rules for > multiple word orders (e.g. I want a flight Tuesday to Boston vs. I want > a flight to Boston on Tuesday) or they are unordered and some other > component will check to be sure you don't get "want I a flight to > Boston". In languages other than English this could be even more of an > issue. > > · It seems obvious to me that XML should be the basic file format, > because the computing tools to manipulate it exist in almost all > programming languages. The arguments for ABNF are presentation/editing > arguments, and they can easily be solved by XSL (presentation) and a > single command-line ABNF-to-XML translator (digestion of edited > grammars); that way, you only need an ABNF parser in one programming > language. This is a no-brainer to me. > > · There's no morphology support. The quick and dirty solution is to > support regular expression matching in tokens. This may not be enough, > because you may want to extend the "semantics" down into the word level > (singular vs. plural is a standard example), and the tag mechanism won't > be enough to help you here. > > · The examples are very nice and really added to the understanding > (including the appendix). > > · How does this grammar support information about the grammar itself? > For example, to retrieve the phonetic structure of words as defined, to > obtain the name of the grammar, and other "state information" as > supported by other recognizer grammars. > > · Does this document address how a recognizer (or application using the > recognizer) obtains the name of the rules that triggered an output? Is > this supposed to be gathered from the 'TAGS' attribute? > > · 1.0 Does not give a strong argument for ABNF and XML? The fact that > there are two formats leads me to wonder if these specifications do not > fully address problems identified by the working group. That is, for > one choice of problems of problems, I would use one specification and > other specifications for other types for problems. Why doesn't W3C come > with an overall specification that combines ABNF and XML. Of course, if > both specifications satisfy both problems, they say so. > > · 1.0 In the paragraph that begins with "Section 5 outlines areas of > future study … " Is this needed since it only refers to n-grams. I > would recommend that you make a general paragraph that list all items up > for future study or remove it from the introduction. Now on the > subject of n-grams, why is n-grams not part of the grammar. Wouldn't it > be wise to wait for the n-gram definitions to be defined before putting > this document out since n-grams may have a bearing on the grammar > language? > > · 1.0 Paragraph begins with "The W3C Standard …" Is this correct? This > should be "The W3C Grammar Standard" or something like that. > > · 2.1 Take out the words "For now" in the second sentence. Why wouldn't > the statement always be true? > > · 2.2 In the paragraph that begins with "Section 4.3 defines import > declarations that act to bind a local alias …" What does the word > "local alias" mean? Is this standard nomenclature? Why is the word > variable not used? (Note, the word alias is used throughout this > document - so a change here affects all places in the document.) > > · 2.2 Under "special rules" section, first sentence, "The rules names > are defined appropriately by the recognizer …" Exactly what does this > sentence mean? > > · 2.3 Example has misspelling of "parenthesese". > > · 2.3 XML for paragraph. Doesn't this whole paragraph belong in the > choice section? > > · 2.4 General Comment: Can the weights be variables? Can these weight > variables be imported from other imported files? What is range of the > weights? Without the range, it is hard to know what an individual > weight means? BTW: what exactly does "occurrence likelihood mean"? > > · 2.6 It is not obvious from the first paragraph what tags are, > especially if they are used for "post-processing of speech recognition > results". What is this post-processing step called? I would suspect > that the tag nomenclature refers to something in another W3C document - > just a hunch. > > · 3.2 There is no definition of private or public - the definition is to > be assumed by the reader. Define the terms. > > · 3.3 The example illustrates comments with ABNF, but not with XML. > > · 4 The first paragraph mentions "all must have unique names". What > does "all" refer to? > > · 4.2 Title is "Grammar Declaration and Locale" - is the word > "Declaration" needed? > > · 4.3 Imports - The first sentence, "… for referencing externally > defined grammars." Shouldn't this be "for referencing externally > defined PUBLIC grammars? > > · 4.3 I do not understand the second paragraph "Note: the import > declarations does not copy … " Just what does the word "copy" mean? Are > you talking of grammar expansion as is done for macro expansion? Isn't > it just included in the current namespace? > > · 4.3 Add a sentence to what the $places.city means in both examples, > since it is the punch line. > > · 4.4 Take out the word "we" of the second sentence. > > · 5.2 Take out the word "Technically" of the second sentence. > > · 5.3 Change "regular grammars and context-free grammars" to "regular > and context-free grammars". > > · 5.3 Delete "technically, n-grams" from the second sentence. > > · 5.5 I do not know what a "fully-defined grammar", let alone a > partially defined grammar? Define fully defined first. > > Synthesis Markup Language > > · One of the "sayas" examples in 2.4 is missing the "type" attribute, > which is described as required. > > · It seems odd to me that things like "paragraph" and "sentence" would > be defined in the speech synthesis markup document. Aren't the relevant > to other aspects of the overall system? > > · Document is very clear and appears to map to very well some other > standards such as JSAPI. Low-level element appears to be a very > powerful element and should be useful. I wish other document from W3C > had a similar feature. > > Reusable Dialogue Markup Language > > · I would disambiguate the use of the word "dialog" in this document. It > seems sometimes to be used in the UI sense (that is, a widget which > gathers information from the user) and other times in the NL sense (an > interaction in which a user engages with a system). > > · While I understand that this specification is intended to address > near-term support for existing technology, I'm very worried that > emerging work in mixed-initiative dialogue will find it hard to migrate > into the standards being developed here. I encourage you to think hard > about that issue. > > · Should this section be Reusable Components, not sure how dialogue fits > here. > > · I had trouble reading this particular document, it contained words > without definitions. > > · 1.0 Please redo the first two paragraphs - they should address how a > standards organization is pursuing reusable dialogs. Focus on the word > standards. Please change the word "subgroup" to "document". Some > fixes, remove "out-of-the-box", remove "etc, e.g." - they both occur in > the same sentence, define "behavior". Also change "which any proposed > markup language" to "which a W3C Voice Browser compatibly markup > language". > > · 1.1 First paragraph. I cannot understand the first sentence "Although > desirable to standardize the interface to all dialogue components, this > standardization is impractical for many dialogues." Please provide > examples. Also be sure to mention how this statement does not > contradict the mission of W3C Voice Browsers which is attempting some > form of "standardization". > > · 1.1 First paragraph. Please define "call flow" and "the interface". > For example, I would recommend "interface" be replaced with words that > have to do with "input" and "output" types. > > · 1.1 Second paragraph. I have no idea what the first sentence means. > > · 2.1.1 Change "address" to "addresses". > > · 2.2 The first sentence contains the "… multiple components to be > active simultaneously …" Isn't this obvious, remove the word > "simultaneously". I am not sure what you mean, however. > > · 2.3.1 Title is "NL Format". I am not sure why reusable components has > just focus on NL - shouldn't it have to deal with other components of > the system? Change the "Natural Language Subgroup" to W3C Voice > Browsers Natural Language document". > > · 2.3.3 Delete "also" from "must also". > > · 2.4 I am not sure how the error/exception handling would work, > especially in an environment where multiple servers are executed. Please > explain how these errors and exceptions can be used for different > servers to communicate. > > · 2.6 I have no idea what this sentence means "Where reasonable, > components will be built using other components to increase consistency > in behavior across components." Ideally, what should be consistent, is > that servers should have consistent behavior as they use parts of the > reusable dialogs. Why do the components themselves have to be > consistent? > > · 3.1 Remove the word "appropriate" from the first sentence. > > · 3.1 Remove the phrase "in one way over another" from the end of the > first paragraph. > > · 3.2 I have no idea what the paragraph under "Task vs. Template" > means. They should define the terms, rather than saying "as-is". > > · 3.3.2. Why is there no allowance for negative numbers? > > · 3.3.3 (and other sections) There is a mention of obtaining an n-best > list of results - however the speech recognition grammar does not allow > an n-best list. > > · 3.3.3 I would recommend the sub-bullet to be modified to allow digit > string up to a certain length. The word "expected" does not really add > any meaning - or what does it mean? > > · 3.3.4 - 3.3.5 Please define a "fully-specified date" (and > partial-specified date). If the components are to resort to "prompting" > to disambiguate the date, why is there no specification on how this > prompting is to be done? Wouldn't this section fit better in the > semantics portion of the document? How would dates such as "This > Friday" or "tomorrow" be handled? > > · 3.3.6 Shouldn't there be a specification for the type of errors to > process or watch for? > > · 3.3.7 - same comments as 3.3.3 > > · 3.3.8 - same comments as 3.3.3 > > · 3.4.2 I am not sure what the phrase "plays a prompt" means? I would > recommend that the component "output" a prompt to the user, and receive > "input" from the user. Please remove the word "she". > > · 3.4.5 - same comments as 3.3.3 > > · 3.4.6. If this document is reusable components, why not create > separate components for credit cards, and SSN. I do not see why > everything is "lumped" into this section. > > · 3.4.7 -- same comment as 3.4.6, but for automobile plates and product > codes. > > · 3.4.8 Please explain what the "other pages" are. > > · 3.4.9 Please define what "hear" is? Don't you want to say "output". > > · 3.4.10 - same comments as 3.4.9 > > · 3.4.11 Please change "valid postal code" to "value international > postal code". > > · 3.4.14 How does this section differ from 3.4.6? > > · 3.4.19 Please remove the word "physical". > > · 4. Remove "per se" from the second sentence. > > · 4.3 - I think the help component would be very beneficial and am > confused as to why some variation of help does not exist. (Help is left > for future work.) > > Multimedia Requirements > > · My major comment here is that if you're talking about the XML > representation of coordinated multimodal data, you really ought to be > looking at some of the work being done in annotation of such data, such > as the European MATE effort and the ATLAS effort being pushed by MITRE, > NIST and the Linguistic Data Consortium. > > · This document defines the requirements of a multimodal language. Is > there a way that this document could include examples in detailed format > like the "Speech Recognition Grammar" document? Obviously this > requires the markup language to be defined. Why not, show examples in > some markup language formats? > > · 1.3 "Complimentary" should be "Complementary". > > · 2.2. Take out the word "things" in the second paragraph. Moreover, > change the second sentence of the second paragraph from "… only one mode > of input will be available at that time … " to "at least one mode of > input must be available at that time …". > > · 2.3 Can you use another word besides "interpreted"? This section and > following sections say that the "markup language" interprets the > input. A recognizer or key-device interprets the input and passes it > to the markup language for processing. > > · 2.8 Spell out what a "UI" is? > > · 2.13 Section is titled, "Support for conflicting input from different > modalities". I am not sure just how a markup language is gong to solve > this issue - isn't this part of an application? Please explain intent > with a better example. > > ------------------------------------------------------------------------ > Name: MITRECOMMENTS8-2000.DOC > MITRECOMMENTS8-2000.DOC Type: Microsoft Word Document (APPLICATION/MSWORD) > Encoding: BASE64 > Download Status: Not downloaded with message
Received on Friday, 18 August 2000 15:25:48 UTC