W3C home > Mailing lists > Public > www-voice@w3.org > July to September 2000

Comments from MITRE

From: Alan Goldschen <alang@mitre.org>
Date: Tue, 15 Aug 2000 20:01:29 -0400
Message-ID: <3999D9D9.B67C59A2@mitre.org>
To: www-voice@w3.org
CC: "Larson, Jim A" <jim.a.larson@intel.com>
Hello Jim and W3C Voice Browsers team:

Here are the comments gathered from members of the DARPA MITRE
Communicator team.  I have placed each comment for each document
together - and have not edited any comments.  (I have attached the Word
Document to this e-mail).

Respectively submitted,

Alan Goldschen for the MITRE DARPA Communicator team.



General Comments

 I read the Grammar, Dialog and Multimodal Requirements docs.
Generally, I found the Grammar and Multimodal documents to be written
much better than the Dialogue document, with many helpful examples. I
just couldn't get anything informative out of the Dialog document. It
was sketchy and had virtually no examples to illustrate the various
requirements".



Speech Recognition Grammar Specification

 It wasn't clear to me what the relationship was between the SR markup
language being proposed and some of the APIs out there for SR, like
JSAPI. For instance, the voice seems to be controlled in the data string
at the moment, while the APIs seem to support a separate call to change
the voice. This contrast should be recognized and reconciled.

 I didn't see anything in the Grammar document about word order. Maybe
this is implicit in the definitions, but as I see it either the rule
elements are ordered, in which case there need to be multiple rules for
multiple word orders (e.g. I want a flight Tuesday to Boston vs. I want
a flight to Boston on Tuesday) or they are unordered and some other
component will check to be sure you don't get "want I a flight to
Boston". In languages other than English this could be even more of an
issue.

 It seems obvious to me that XML should be the basic file format,
because the computing tools to manipulate it exist in almost all
programming languages. The arguments for ABNF are presentation/editing
arguments, and they can easily be solved by XSL (presentation) and a
single command-line ABNF-to-XML translator (digestion of edited
grammars); that way, you only need an ABNF parser in one programming
language. This is a no-brainer to me.  

 There's no morphology support. The quick and dirty solution is to
support regular expression matching in tokens. This may not be enough,
because you may want to extend the "semantics" down into the word level
(singular vs. plural is a standard example), and the tag mechanism won't
be enough to help you here. 

 The examples are very nice and really added to the understanding
(including the appendix).

 How does this grammar support information about the grammar itself? 
For example, to retrieve the phonetic structure of words as defined, to
obtain the name of the grammar, and other "state information" as
supported by other recognizer grammars.  

 Does this document address how a recognizer (or application using the
recognizer) obtains the name of the rules that triggered an output?  Is
this supposed to be gathered from the 'TAGS' attribute?

 1.0 Does not give a strong argument for ABNF and XML?  The fact that
there are two formats leads me to wonder if these specifications do not
fully address problems identified by the working group.  That is, for
one choice of problems of problems, I would use one specification and
other specifications for other types for problems.  Why doesn't W3C come
with an overall specification that combines ABNF and XML.  Of course, if
both specifications satisfy both problems, they say so.

 1.0 In the paragraph that begins with "Section 5 outlines areas of
future study … "  Is this needed since it only refers to n-grams.  I
would recommend that you make a general paragraph that list all items up
for future study or remove it from the introduction.   Now on the
subject of n-grams, why is n-grams not part of the grammar.  Wouldn't it
be wise to wait for the n-gram definitions to be defined before putting
this document out since n-grams may have a bearing on the grammar
language? 

 1.0	Paragraph begins with "The W3C Standard …"  Is this correct?  This
should be "The W3C Grammar Standard" or something like that.

 2.1 Take out the words "For now" in the second sentence.  Why wouldn't
the statement always be true?

 2.2 In the paragraph that begins with "Section 4.3 defines import
declarations that act to bind a local alias …"   What does the word
"local alias" mean?  Is this standard nomenclature? Why is the word
variable not used?  (Note, the word alias is used throughout this
document - so a change here affects all places in the document.)

 2.2 Under "special rules" section, first sentence, "The rules names
are defined appropriately by the recognizer …" Exactly what does this
sentence mean?

 2.3 Example has misspelling of "parenthesese".

 2.3 XML for paragraph.  Doesn't this whole paragraph belong in the
choice section?

 2.4 General Comment:  Can the weights be variables? Can these weight
variables  be imported from other imported files?  What is range of the
weights?  Without the range, it is hard to know what an individual
weight means?  BTW:  what exactly does "occurrence likelihood mean"?

 2.6 It is not obvious from the first paragraph what tags are,
especially if they are used for "post-processing of speech recognition
results".  What is this post-processing step called?  I would suspect
that the tag nomenclature refers to something in another W3C document -
just a hunch.

 3.2 There is no definition of private or public - the definition is to
be assumed by the reader.  Define the terms.

 3.3 The example illustrates comments with ABNF, but not with XML.

 4  The first paragraph mentions "all must have unique names".  What
does "all" refer to?

 4.2 Title is "Grammar Declaration and Locale" - is the word
"Declaration" needed?

 4.3 Imports - The first sentence, "… for referencing externally
defined grammars."  Shouldn't this be "for referencing externally
defined PUBLIC grammars?

 4.3 I do not understand the second paragraph "Note:  the import
declarations does not copy … " Just what does the word "copy" mean?  Are
you talking of grammar expansion as is done for macro expansion?  Isn't
it  just included in the current namespace?

 4.3 Add a sentence to what the $places.city means in both examples,
since it is the punch line.

 4.4 Take out the word "we" of the second sentence.

 5.2 Take out the word "Technically" of the second sentence.

 5.3 Change "regular grammars and context-free grammars" to "regular
and context-free grammars".

 5.3 Delete "technically, n-grams" from the second sentence.

 5.5 I do not know what a "fully-defined grammar", let alone a
partially defined grammar?  Define fully defined first.

Synthesis Markup Language


 One of the "sayas" examples in 2.4 is missing the "type" attribute,
which is described as required.

 It seems odd to me that things like "paragraph" and "sentence" would
be defined in the speech synthesis markup document. Aren't the relevant
to other aspects of the overall system?

 Document is very clear and appears to map to very well some other
standards such as JSAPI.   Low-level element appears to be a very
powerful element and should be useful.  I wish other document from W3C
had a similar feature.



Reusable Dialogue Markup Language

 I would disambiguate the use of the word "dialog" in this document. It
seems sometimes to be used in the UI sense (that is, a widget which
gathers information from the user) and other times in the NL sense (an
interaction in which a user engages with a system). 

 While I understand that this specification is intended to address
near-term support for existing technology, I'm very worried that
emerging work in mixed-initiative dialogue will find it hard to migrate
into the standards being developed here. I encourage you to think hard
about that issue.

 Should this section be Reusable Components, not sure how dialogue fits
here.

 I had trouble reading this particular document, it contained words
without definitions.

 1.0 Please redo the first two paragraphs - they should address how a
standards organization is pursuing reusable dialogs.  Focus on the word
standards. Please change the word "subgroup" to "document".   Some
fixes, remove "out-of-the-box", remove "etc, e.g." - they both occur in
the same sentence, define "behavior".    Also change "which any proposed
markup language" to "which a W3C Voice Browser compatibly markup
language".

 1.1 First paragraph.  I cannot understand the first sentence "Although
desirable to standardize the interface to all dialogue components, this
standardization is impractical for many dialogues."  Please provide
examples.  Also be sure to mention how this statement does not
contradict the mission of W3C Voice Browsers which is attempting some
form of "standardization".

 1.1  First paragraph. Please define "call flow" and "the interface". 
For example, I would recommend "interface" be replaced with words that
have to do with "input" and "output" types.

 1.1 Second paragraph.  I have no idea what the first sentence means.

 2.1.1 Change "address" to "addresses".

 2.2 The first sentence contains the "… multiple components to be
active simultaneously …"  Isn't this obvious, remove the word
"simultaneously".  I am not sure what you mean, however.

 2.3.1 Title is "NL Format".  I am not sure why reusable components has
just focus on NL - shouldn't it have to deal with other components of
the system?  Change the "Natural Language Subgroup" to W3C Voice
Browsers Natural Language document".

 2.3.3 Delete "also" from "must also".

 2.4 I am not sure how the error/exception handling would work,
especially in an environment where multiple servers are executed. Please
explain how these errors and exceptions can be used for different
servers to communicate. 

 2.6 I have no idea what this sentence means "Where reasonable,
components will be built using other components to increase consistency
in behavior across components."  Ideally, what should be consistent, is
that servers should have consistent behavior as they use parts of the
reusable dialogs.   Why do the components themselves have to be
consistent?

 3.1 Remove the word "appropriate" from the first sentence.

 3.1 Remove the phrase "in one way over another" from the end of the
first paragraph.

 3.2  I have no idea what the paragraph under "Task vs. Template"
means.  They should define the terms, rather than saying "as-is".

 3.3.2.  Why is there no allowance for negative numbers?

 3.3.3 (and other sections) There is a mention of obtaining an n-best
list of results - however the speech recognition grammar does not allow
an n-best list.  

 3.3.3 I would recommend the sub-bullet to be modified to allow digit
string up to a certain length.  The word "expected" does not really add
any meaning - or what does it mean?

 3.3.4 - 3.3.5  Please define a "fully-specified date" (and
partial-specified date).  If the components are to resort to "prompting"
to disambiguate the date, why is there no specification on how this
prompting is to be done?   Wouldn't this section fit better in the
semantics portion of the document?  How would dates such as "This
Friday" or "tomorrow" be handled? 

 3.3.6 Shouldn't there be a specification for the type of errors to
process or watch for?

 3.3.7 - same comments as 3.3.3

 3.3.8 - same comments as 3.3.3

 3.4.2 I am not sure what the phrase "plays a prompt" means?  I would
recommend that  the component "output" a prompt to the user, and receive
"input" from the user.  Please remove the word "she".

 3.4.5 - same comments as 3.3.3

 3.4.6.  If this document is reusable components, why not create
separate components for credit cards, and SSN.  I do not see why
everything is "lumped" into this section.

 3.4.7 -- same comment as 3.4.6, but for automobile plates and product
codes.

 3.4.8  Please explain what the "other pages" are.

 3.4.9  Please define what "hear" is?  Don't you want to say "output".

 3.4.10 - same comments as 3.4.9

 3.4.11  Please change "valid postal code" to "value international
postal code".

 3.4.14  How does this section differ from 3.4.6?

 3.4.19  Please remove the word "physical".

 4.  Remove  "per se" from the second sentence.

 4.3 - I think the help component would be very beneficial and am
confused as to why some variation of help does not exist.  (Help is left
for future work.)





Multimedia Requirements

 My major comment here is that if you're talking about the XML
representation of coordinated multimodal data, you really ought to be
looking at some of the work being done in annotation of such data, such
as the European MATE effort and the ATLAS effort being pushed by MITRE,
NIST and the Linguistic Data Consortium.

 This document defines the requirements of a multimodal language.  Is
there a way that this document could include examples in detailed format
like the "Speech Recognition Grammar" document?   Obviously this
requires the markup language to be defined.  Why not, show examples in
some markup language formats? 

 1.3 "Complimentary" should be "Complementary".

 2.2. Take out the word "things" in the second paragraph.  Moreover,
change the second sentence of the second paragraph from "… only one mode
of input will be available at that time … " to "at least one mode of
input must be available at that time …".

 2.3  Can you use another word besides "interpreted"?  This section and
following sections say that the "markup language" interprets the
input.   A recognizer or key-device interprets the input and passes it
to the markup language for processing.

 2.8 Spell out what a "UI" is?

 2.13 Section is titled, "Support for conflicting input from different
modalities".  I am not sure just how a markup language is gong to solve
this issue - isn't this part of an application?  Please explain intent
with a better example.


Received on Tuesday, 15 August 2000 20:06:57 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 October 2006 12:48:53 GMT