- From: Alan Goldschen <alang@mitre.org>
- Date: Tue, 15 Aug 2000 20:01:29 -0400
- To: www-voice@w3.org
- CC: "Larson, Jim A" <jim.a.larson@intel.com>
- Message-ID: <3999D9D9.B67C59A2@mitre.org>
Hello Jim and W3C Voice Browsers team: Here are the comments gathered from members of the DARPA MITRE Communicator team. I have placed each comment for each document together - and have not edited any comments. (I have attached the Word Document to this e-mail). Respectively submitted, Alan Goldschen for the MITRE DARPA Communicator team. General Comments · I read the Grammar, Dialog and Multimodal Requirements docs. Generally, I found the Grammar and Multimodal documents to be written much better than the Dialogue document, with many helpful examples. I just couldn't get anything informative out of the Dialog document. It was sketchy and had virtually no examples to illustrate the various requirements". Speech Recognition Grammar Specification · It wasn't clear to me what the relationship was between the SR markup language being proposed and some of the APIs out there for SR, like JSAPI. For instance, the voice seems to be controlled in the data string at the moment, while the APIs seem to support a separate call to change the voice. This contrast should be recognized and reconciled. · I didn't see anything in the Grammar document about word order. Maybe this is implicit in the definitions, but as I see it either the rule elements are ordered, in which case there need to be multiple rules for multiple word orders (e.g. I want a flight Tuesday to Boston vs. I want a flight to Boston on Tuesday) or they are unordered and some other component will check to be sure you don't get "want I a flight to Boston". In languages other than English this could be even more of an issue. · It seems obvious to me that XML should be the basic file format, because the computing tools to manipulate it exist in almost all programming languages. The arguments for ABNF are presentation/editing arguments, and they can easily be solved by XSL (presentation) and a single command-line ABNF-to-XML translator (digestion of edited grammars); that way, you only need an ABNF parser in one programming language. This is a no-brainer to me. · There's no morphology support. The quick and dirty solution is to support regular expression matching in tokens. This may not be enough, because you may want to extend the "semantics" down into the word level (singular vs. plural is a standard example), and the tag mechanism won't be enough to help you here. · The examples are very nice and really added to the understanding (including the appendix). · How does this grammar support information about the grammar itself? For example, to retrieve the phonetic structure of words as defined, to obtain the name of the grammar, and other "state information" as supported by other recognizer grammars. · Does this document address how a recognizer (or application using the recognizer) obtains the name of the rules that triggered an output? Is this supposed to be gathered from the 'TAGS' attribute? · 1.0 Does not give a strong argument for ABNF and XML? The fact that there are two formats leads me to wonder if these specifications do not fully address problems identified by the working group. That is, for one choice of problems of problems, I would use one specification and other specifications for other types for problems. Why doesn't W3C come with an overall specification that combines ABNF and XML. Of course, if both specifications satisfy both problems, they say so. · 1.0 In the paragraph that begins with "Section 5 outlines areas of future study … " Is this needed since it only refers to n-grams. I would recommend that you make a general paragraph that list all items up for future study or remove it from the introduction. Now on the subject of n-grams, why is n-grams not part of the grammar. Wouldn't it be wise to wait for the n-gram definitions to be defined before putting this document out since n-grams may have a bearing on the grammar language? · 1.0 Paragraph begins with "The W3C Standard …" Is this correct? This should be "The W3C Grammar Standard" or something like that. · 2.1 Take out the words "For now" in the second sentence. Why wouldn't the statement always be true? · 2.2 In the paragraph that begins with "Section 4.3 defines import declarations that act to bind a local alias …" What does the word "local alias" mean? Is this standard nomenclature? Why is the word variable not used? (Note, the word alias is used throughout this document - so a change here affects all places in the document.) · 2.2 Under "special rules" section, first sentence, "The rules names are defined appropriately by the recognizer …" Exactly what does this sentence mean? · 2.3 Example has misspelling of "parenthesese". · 2.3 XML for paragraph. Doesn't this whole paragraph belong in the choice section? · 2.4 General Comment: Can the weights be variables? Can these weight variables be imported from other imported files? What is range of the weights? Without the range, it is hard to know what an individual weight means? BTW: what exactly does "occurrence likelihood mean"? · 2.6 It is not obvious from the first paragraph what tags are, especially if they are used for "post-processing of speech recognition results". What is this post-processing step called? I would suspect that the tag nomenclature refers to something in another W3C document - just a hunch. · 3.2 There is no definition of private or public - the definition is to be assumed by the reader. Define the terms. · 3.3 The example illustrates comments with ABNF, but not with XML. · 4 The first paragraph mentions "all must have unique names". What does "all" refer to? · 4.2 Title is "Grammar Declaration and Locale" - is the word "Declaration" needed? · 4.3 Imports - The first sentence, "… for referencing externally defined grammars." Shouldn't this be "for referencing externally defined PUBLIC grammars? · 4.3 I do not understand the second paragraph "Note: the import declarations does not copy … " Just what does the word "copy" mean? Are you talking of grammar expansion as is done for macro expansion? Isn't it just included in the current namespace? · 4.3 Add a sentence to what the $places.city means in both examples, since it is the punch line. · 4.4 Take out the word "we" of the second sentence. · 5.2 Take out the word "Technically" of the second sentence. · 5.3 Change "regular grammars and context-free grammars" to "regular and context-free grammars". · 5.3 Delete "technically, n-grams" from the second sentence. · 5.5 I do not know what a "fully-defined grammar", let alone a partially defined grammar? Define fully defined first. Synthesis Markup Language · One of the "sayas" examples in 2.4 is missing the "type" attribute, which is described as required. · It seems odd to me that things like "paragraph" and "sentence" would be defined in the speech synthesis markup document. Aren't the relevant to other aspects of the overall system? · Document is very clear and appears to map to very well some other standards such as JSAPI. Low-level element appears to be a very powerful element and should be useful. I wish other document from W3C had a similar feature. Reusable Dialogue Markup Language · I would disambiguate the use of the word "dialog" in this document. It seems sometimes to be used in the UI sense (that is, a widget which gathers information from the user) and other times in the NL sense (an interaction in which a user engages with a system). · While I understand that this specification is intended to address near-term support for existing technology, I'm very worried that emerging work in mixed-initiative dialogue will find it hard to migrate into the standards being developed here. I encourage you to think hard about that issue. · Should this section be Reusable Components, not sure how dialogue fits here. · I had trouble reading this particular document, it contained words without definitions. · 1.0 Please redo the first two paragraphs - they should address how a standards organization is pursuing reusable dialogs. Focus on the word standards. Please change the word "subgroup" to "document". Some fixes, remove "out-of-the-box", remove "etc, e.g." - they both occur in the same sentence, define "behavior". Also change "which any proposed markup language" to "which a W3C Voice Browser compatibly markup language". · 1.1 First paragraph. I cannot understand the first sentence "Although desirable to standardize the interface to all dialogue components, this standardization is impractical for many dialogues." Please provide examples. Also be sure to mention how this statement does not contradict the mission of W3C Voice Browsers which is attempting some form of "standardization". · 1.1 First paragraph. Please define "call flow" and "the interface". For example, I would recommend "interface" be replaced with words that have to do with "input" and "output" types. · 1.1 Second paragraph. I have no idea what the first sentence means. · 2.1.1 Change "address" to "addresses". · 2.2 The first sentence contains the "… multiple components to be active simultaneously …" Isn't this obvious, remove the word "simultaneously". I am not sure what you mean, however. · 2.3.1 Title is "NL Format". I am not sure why reusable components has just focus on NL - shouldn't it have to deal with other components of the system? Change the "Natural Language Subgroup" to W3C Voice Browsers Natural Language document". · 2.3.3 Delete "also" from "must also". · 2.4 I am not sure how the error/exception handling would work, especially in an environment where multiple servers are executed. Please explain how these errors and exceptions can be used for different servers to communicate. · 2.6 I have no idea what this sentence means "Where reasonable, components will be built using other components to increase consistency in behavior across components." Ideally, what should be consistent, is that servers should have consistent behavior as they use parts of the reusable dialogs. Why do the components themselves have to be consistent? · 3.1 Remove the word "appropriate" from the first sentence. · 3.1 Remove the phrase "in one way over another" from the end of the first paragraph. · 3.2 I have no idea what the paragraph under "Task vs. Template" means. They should define the terms, rather than saying "as-is". · 3.3.2. Why is there no allowance for negative numbers? · 3.3.3 (and other sections) There is a mention of obtaining an n-best list of results - however the speech recognition grammar does not allow an n-best list. · 3.3.3 I would recommend the sub-bullet to be modified to allow digit string up to a certain length. The word "expected" does not really add any meaning - or what does it mean? · 3.3.4 - 3.3.5 Please define a "fully-specified date" (and partial-specified date). If the components are to resort to "prompting" to disambiguate the date, why is there no specification on how this prompting is to be done? Wouldn't this section fit better in the semantics portion of the document? How would dates such as "This Friday" or "tomorrow" be handled? · 3.3.6 Shouldn't there be a specification for the type of errors to process or watch for? · 3.3.7 - same comments as 3.3.3 · 3.3.8 - same comments as 3.3.3 · 3.4.2 I am not sure what the phrase "plays a prompt" means? I would recommend that the component "output" a prompt to the user, and receive "input" from the user. Please remove the word "she". · 3.4.5 - same comments as 3.3.3 · 3.4.6. If this document is reusable components, why not create separate components for credit cards, and SSN. I do not see why everything is "lumped" into this section. · 3.4.7 -- same comment as 3.4.6, but for automobile plates and product codes. · 3.4.8 Please explain what the "other pages" are. · 3.4.9 Please define what "hear" is? Don't you want to say "output". · 3.4.10 - same comments as 3.4.9 · 3.4.11 Please change "valid postal code" to "value international postal code". · 3.4.14 How does this section differ from 3.4.6? · 3.4.19 Please remove the word "physical". · 4. Remove "per se" from the second sentence. · 4.3 - I think the help component would be very beneficial and am confused as to why some variation of help does not exist. (Help is left for future work.) Multimedia Requirements · My major comment here is that if you're talking about the XML representation of coordinated multimodal data, you really ought to be looking at some of the work being done in annotation of such data, such as the European MATE effort and the ATLAS effort being pushed by MITRE, NIST and the Linguistic Data Consortium. · This document defines the requirements of a multimodal language. Is there a way that this document could include examples in detailed format like the "Speech Recognition Grammar" document? Obviously this requires the markup language to be defined. Why not, show examples in some markup language formats? · 1.3 "Complimentary" should be "Complementary". · 2.2. Take out the word "things" in the second paragraph. Moreover, change the second sentence of the second paragraph from "… only one mode of input will be available at that time … " to "at least one mode of input must be available at that time …". · 2.3 Can you use another word besides "interpreted"? This section and following sections say that the "markup language" interprets the input. A recognizer or key-device interprets the input and passes it to the markup language for processing. · 2.8 Spell out what a "UI" is? · 2.13 Section is titled, "Support for conflicting input from different modalities". I am not sure just how a markup language is gong to solve this issue - isn't this part of an application? Please explain intent with a better example.
Attachments
- application/msword attachment: MITREComments8-2000.doc
Received on Tuesday, 15 August 2000 20:06:57 UTC