Re: Comments from MITRE

I have a clarification question:  under General Comments, the term "Dialog" refers to the
Reusable Dialog Components Requirements Document, right?

-- dan

Alan Goldschen wrote:

> Hello Jim and W3C Voice Browsers team:
>
> Here are the comments gathered from members of the DARPA MITRE
> Communicator team.  I have placed each comment for each document
> together - and have not edited any comments.  (I have attached the Word
> Document to this e-mail).
>
> Respectively submitted,
>
> Alan Goldschen for the MITRE DARPA Communicator team.
>
> General Comments
>
> · I read the Grammar, Dialog and Multimodal Requirements docs.
> Generally, I found the Grammar and Multimodal documents to be written
> much better than the Dialogue document, with many helpful examples. I
> just couldn't get anything informative out of the Dialog document. It
> was sketchy and had virtually no examples to illustrate the various
> requirements".
>
> Speech Recognition Grammar Specification
>
> · It wasn't clear to me what the relationship was between the SR markup
> language being proposed and some of the APIs out there for SR, like
> JSAPI. For instance, the voice seems to be controlled in the data string
> at the moment, while the APIs seem to support a separate call to change
> the voice. This contrast should be recognized and reconciled.
>
> · I didn't see anything in the Grammar document about word order. Maybe
> this is implicit in the definitions, but as I see it either the rule
> elements are ordered, in which case there need to be multiple rules for
> multiple word orders (e.g. I want a flight Tuesday to Boston vs. I want
> a flight to Boston on Tuesday) or they are unordered and some other
> component will check to be sure you don't get "want I a flight to
> Boston". In languages other than English this could be even more of an
> issue.
>
> · It seems obvious to me that XML should be the basic file format,
> because the computing tools to manipulate it exist in almost all
> programming languages. The arguments for ABNF are presentation/editing
> arguments, and they can easily be solved by XSL (presentation) and a
> single command-line ABNF-to-XML translator (digestion of edited
> grammars); that way, you only need an ABNF parser in one programming
> language. This is a no-brainer to me.
>
> · There's no morphology support. The quick and dirty solution is to
> support regular expression matching in tokens. This may not be enough,
> because you may want to extend the "semantics" down into the word level
> (singular vs. plural is a standard example), and the tag mechanism won't
> be enough to help you here.
>
> · The examples are very nice and really added to the understanding
> (including the appendix).
>
> · How does this grammar support information about the grammar itself?
> For example, to retrieve the phonetic structure of words as defined, to
> obtain the name of the grammar, and other "state information" as
> supported by other recognizer grammars.
>
> · Does this document address how a recognizer (or application using the
> recognizer) obtains the name of the rules that triggered an output?  Is
> this supposed to be gathered from the 'TAGS' attribute?
>
> · 1.0 Does not give a strong argument for ABNF and XML?  The fact that
> there are two formats leads me to wonder if these specifications do not
> fully address problems identified by the working group.  That is, for
> one choice of problems of problems, I would use one specification and
> other specifications for other types for problems.  Why doesn't W3C come
> with an overall specification that combines ABNF and XML.  Of course, if
> both specifications satisfy both problems, they say so.
>
> · 1.0 In the paragraph that begins with "Section 5 outlines areas of
> future study … "  Is this needed since it only refers to n-grams.  I
> would recommend that you make a general paragraph that list all items up
> for future study or remove it from the introduction.   Now on the
> subject of n-grams, why is n-grams not part of the grammar.  Wouldn't it
> be wise to wait for the n-gram definitions to be defined before putting
> this document out since n-grams may have a bearing on the grammar
> language?
>
> · 1.0   Paragraph begins with "The W3C Standard …"  Is this correct?  This
> should be "The W3C Grammar Standard" or something like that.
>
> · 2.1 Take out the words "For now" in the second sentence.  Why wouldn't
> the statement always be true?
>
> · 2.2 In the paragraph that begins with "Section 4.3 defines import
> declarations that act to bind a local alias …"   What does the word
> "local alias" mean?  Is this standard nomenclature? Why is the word
> variable not used?  (Note, the word alias is used throughout this
> document - so a change here affects all places in the document.)
>
> · 2.2 Under "special rules" section, first sentence, "The rules names
> are defined appropriately by the recognizer …" Exactly what does this
> sentence mean?
>
> · 2.3 Example has misspelling of "parenthesese".
>
> · 2.3 XML for paragraph.  Doesn't this whole paragraph belong in the
> choice section?
>
> · 2.4 General Comment:  Can the weights be variables? Can these weight
> variables  be imported from other imported files?  What is range of the
> weights?  Without the range, it is hard to know what an individual
> weight means?  BTW:  what exactly does "occurrence likelihood mean"?
>
> · 2.6 It is not obvious from the first paragraph what tags are,
> especially if they are used for "post-processing of speech recognition
> results".  What is this post-processing step called?  I would suspect
> that the tag nomenclature refers to something in another W3C document -
> just a hunch.
>
> · 3.2 There is no definition of private or public - the definition is to
> be assumed by the reader.  Define the terms.
>
> · 3.3 The example illustrates comments with ABNF, but not with XML.
>
> · 4  The first paragraph mentions "all must have unique names".  What
> does "all" refer to?
>
> · 4.2 Title is "Grammar Declaration and Locale" - is the word
> "Declaration" needed?
>
> · 4.3 Imports - The first sentence, "… for referencing externally
> defined grammars."  Shouldn't this be "for referencing externally
> defined PUBLIC grammars?
>
> · 4.3 I do not understand the second paragraph "Note:  the import
> declarations does not copy … " Just what does the word "copy" mean?  Are
> you talking of grammar expansion as is done for macro expansion?  Isn't
> it  just included in the current namespace?
>
> · 4.3 Add a sentence to what the $places.city means in both examples,
> since it is the punch line.
>
> · 4.4 Take out the word "we" of the second sentence.
>
> · 5.2 Take out the word "Technically" of the second sentence.
>
> · 5.3 Change "regular grammars and context-free grammars" to "regular
> and context-free grammars".
>
> · 5.3 Delete "technically, n-grams" from the second sentence.
>
> · 5.5 I do not know what a "fully-defined grammar", let alone a
> partially defined grammar?  Define fully defined first.
>
> Synthesis Markup Language
>
> · One of the "sayas" examples in 2.4 is missing the "type" attribute,
> which is described as required.
>
> · It seems odd to me that things like "paragraph" and "sentence" would
> be defined in the speech synthesis markup document. Aren't the relevant
> to other aspects of the overall system?
>
> · Document is very clear and appears to map to very well some other
> standards such as JSAPI.   Low-level element appears to be a very
> powerful element and should be useful.  I wish other document from W3C
> had a similar feature.
>
> Reusable Dialogue Markup Language
>
> · I would disambiguate the use of the word "dialog" in this document. It
> seems sometimes to be used in the UI sense (that is, a widget which
> gathers information from the user) and other times in the NL sense (an
> interaction in which a user engages with a system).
>
> · While I understand that this specification is intended to address
> near-term support for existing technology, I'm very worried that
> emerging work in mixed-initiative dialogue will find it hard to migrate
> into the standards being developed here. I encourage you to think hard
> about that issue.
>
> · Should this section be Reusable Components, not sure how dialogue fits
> here.
>
> · I had trouble reading this particular document, it contained words
> without definitions.
>
> · 1.0 Please redo the first two paragraphs - they should address how a
> standards organization is pursuing reusable dialogs.  Focus on the word
> standards. Please change the word "subgroup" to "document".   Some
> fixes, remove "out-of-the-box", remove "etc, e.g." - they both occur in
> the same sentence, define "behavior".    Also change "which any proposed
> markup language" to "which a W3C Voice Browser compatibly markup
> language".
>
> · 1.1 First paragraph.  I cannot understand the first sentence "Although
> desirable to standardize the interface to all dialogue components, this
> standardization is impractical for many dialogues."  Please provide
> examples.  Also be sure to mention how this statement does not
> contradict the mission of W3C Voice Browsers which is attempting some
> form of "standardization".
>
> · 1.1  First paragraph. Please define "call flow" and "the interface".
> For example, I would recommend "interface" be replaced with words that
> have to do with "input" and "output" types.
>
> · 1.1 Second paragraph.  I have no idea what the first sentence means.
>
> · 2.1.1 Change "address" to "addresses".
>
> · 2.2 The first sentence contains the "… multiple components to be
> active simultaneously …"  Isn't this obvious, remove the word
> "simultaneously".  I am not sure what you mean, however.
>
> · 2.3.1 Title is "NL Format".  I am not sure why reusable components has
> just focus on NL - shouldn't it have to deal with other components of
> the system?  Change the "Natural Language Subgroup" to W3C Voice
> Browsers Natural Language document".
>
> · 2.3.3 Delete "also" from "must also".
>
> · 2.4 I am not sure how the error/exception handling would work,
> especially in an environment where multiple servers are executed. Please
> explain how these errors and exceptions can be used for different
> servers to communicate.
>
> · 2.6 I have no idea what this sentence means "Where reasonable,
> components will be built using other components to increase consistency
> in behavior across components."  Ideally, what should be consistent, is
> that servers should have consistent behavior as they use parts of the
> reusable dialogs.   Why do the components themselves have to be
> consistent?
>
> · 3.1 Remove the word "appropriate" from the first sentence.
>
> · 3.1 Remove the phrase "in one way over another" from the end of the
> first paragraph.
>
> · 3.2  I have no idea what the paragraph under "Task vs. Template"
> means.  They should define the terms, rather than saying "as-is".
>
> · 3.3.2.  Why is there no allowance for negative numbers?
>
> · 3.3.3 (and other sections) There is a mention of obtaining an n-best
> list of results - however the speech recognition grammar does not allow
> an n-best list.
>
> · 3.3.3 I would recommend the sub-bullet to be modified to allow digit
> string up to a certain length.  The word "expected" does not really add
> any meaning - or what does it mean?
>
> · 3.3.4 - 3.3.5  Please define a "fully-specified date" (and
> partial-specified date).  If the components are to resort to "prompting"
> to disambiguate the date, why is there no specification on how this
> prompting is to be done?   Wouldn't this section fit better in the
> semantics portion of the document?  How would dates such as "This
> Friday" or "tomorrow" be handled?
>
> · 3.3.6 Shouldn't there be a specification for the type of errors to
> process or watch for?
>
> · 3.3.7 - same comments as 3.3.3
>
> · 3.3.8 - same comments as 3.3.3
>
> · 3.4.2 I am not sure what the phrase "plays a prompt" means?  I would
> recommend that  the component "output" a prompt to the user, and receive
> "input" from the user.  Please remove the word "she".
>
> · 3.4.5 - same comments as 3.3.3
>
> · 3.4.6.  If this document is reusable components, why not create
> separate components for credit cards, and SSN.  I do not see why
> everything is "lumped" into this section.
>
> · 3.4.7 -- same comment as 3.4.6, but for automobile plates and product
> codes.
>
> · 3.4.8  Please explain what the "other pages" are.
>
> · 3.4.9  Please define what "hear" is?  Don't you want to say "output".
>
> · 3.4.10 - same comments as 3.4.9
>
> · 3.4.11  Please change "valid postal code" to "value international
> postal code".
>
> · 3.4.14  How does this section differ from 3.4.6?
>
> · 3.4.19  Please remove the word "physical".
>
> · 4.  Remove  "per se" from the second sentence.
>
> · 4.3 - I think the help component would be very beneficial and am
> confused as to why some variation of help does not exist.  (Help is left
> for future work.)
>
> Multimedia Requirements
>
> · My major comment here is that if you're talking about the XML
> representation of coordinated multimodal data, you really ought to be
> looking at some of the work being done in annotation of such data, such
> as the European MATE effort and the ATLAS effort being pushed by MITRE,
> NIST and the Linguistic Data Consortium.
>
> · This document defines the requirements of a multimodal language.  Is
> there a way that this document could include examples in detailed format
> like the "Speech Recognition Grammar" document?   Obviously this
> requires the markup language to be defined.  Why not, show examples in
> some markup language formats?
>
> · 1.3 "Complimentary" should be "Complementary".
>
> · 2.2. Take out the word "things" in the second paragraph.  Moreover,
> change the second sentence of the second paragraph from "… only one mode
> of input will be available at that time … " to "at least one mode of
> input must be available at that time …".
>
> · 2.3  Can you use another word besides "interpreted"?  This section and
> following sections say that the "markup language" interprets the
> input.   A recognizer or key-device interprets the input and passes it
> to the markup language for processing.
>
> · 2.8 Spell out what a "UI" is?
>
> · 2.13 Section is titled, "Support for conflicting input from different
> modalities".  I am not sure just how a markup language is gong to solve
> this issue - isn't this part of an application?  Please explain intent
> with a better example.
>
>   ------------------------------------------------------------------------
>                                      Name: MITRECOMMENTS8-2000.DOC
>    MITRECOMMENTS8-2000.DOC           Type: Microsoft Word Document (APPLICATION/MSWORD)
>                                  Encoding: BASE64
>                           Download Status: Not downloaded with message

Received on Friday, 18 August 2000 15:25:48 UTC