- From: Dale Russell <Dale_Russell-ADR004@email.mot.com>
- Date: Mon, 15 Jan 2001 09:27:12 -0600
- To: <www-voice@w3.org>
- Cc: "Harry.Bliss" <Harry_Bliss-AHB001@email.mot.com>, "Ferrans James-jferran1" <James_Ferrans-JFERRAN1@email.mot.com>, "Dale Russell" <Dale_Russell-ADR004@email.mot.com>
My name is Dale Russell, and I'm a member of a Natural Language Processing research group within the Human Interface Labs at Motorola. I've been looking over the Working Draft of the Speech Recognition Grammar Specification document, the version of January 3, 2001, and had some comments I'd like to share. For convenience, I'll list my comments in the order in which they apply in the document, although this means that I'll be mixing comments on surface-level details with more substantive issues. Section 1.1, second paragraph - "through-out" should be one word, "throughout", not hyphenated. Section 1.2, second text paragraph, second sentence - I believe "word and patterns of words" should be "words and patterns of words". Section 1.2, third text paragraph, second sentence - The colon after "invoked" should be changed to a semi-colon. Section 1.2, last bullet point - You say that the grammar format does not address the loading of lexicons. I'm not sure whether this is a lexicon issue or not, but members of our group feel that it would be helpful to be able to optionally stipulate the pre-terminal rules of a grammar. This could be useful in applications where certain pre-terminal rules are constructed dynamically, as in the case of proper names being read in from a telephone book, or restaurant names or movie titles downloaded from a web page. I realize that even defining pre-terminal rules is not always straightforward, but I would propose the following definition: A pre-terminal rule is one whose right-hand side consists of either a single terminal or a set of alternatives, where each alternative is a single terminal. While a quoted string is a terminal, a sequence is not. Therefore the first example below is a pre-terminal rule, while the second isn't: $city = Boston | "New York"; $city = Boston | ( New York ) ; An alternative definition of a pre-terminal rule could be a rule whose right-hand side contains no non-terminals, only terminals, with any desired abbreviatory notation. Under this definition, both of the above examples would be pre-terminal rules. Our group does not have a proposal for how pre-terminal rules should be notated in either ABNF or XML, but we wanted to propose this as a type of information that should be considered for coverage in this document. Section 2.1 - The description of tokens in ABNF form could be much clearer. The first sentence says that "Any plain text is a token" but this is contradicted by the next sentence, which specifies that white space and special symbols are not part of a tokens. Instead, it would be better to exhaustively specify the set of characters that can occur inside a (non-quoted) token. Or, you could say that a token can contain all characters except for white space and special symbols, and then give a complete list of the special symbols. Is the list inside the parens supposed to be exhaustive? The "e.g." would seem to indicate that it isn't. Why not? The last sentence of this paragraph says that "Tokens may be explicitly quoted if they contain white-space or special symbols." That would seem to indicate that it's illegal to quote tokens not containing white space or special symbols. Is that what was intended? If so, that's somewhat surprising; if not, this sentence is misleading. Either way, I'd like to see it spelled out more explicitly. In the examples below, 'this is a test' (without quotes) is indicated to be a sequence of four tokens, but there's no corresponding indication for 'bon voyage' (without quotes), which I believe is a sequence of two tokens. Is that correct? As is noted in the Issues, there's more to be said about quoted strings, especially the use of backslash as an escape character. Unless you know of a reason this wouldn't work, I would propose that any character, including white space, be allowed to occur within a quoted string, except for a backslash and a double quote itself. A backslash or a double quote can occur inside a quoted string, but only if it is backslashed. So "\"" and "\\" are quoted strings, while "\" is lacking a closing quote. Section 2.2.1, first paragraph - "When referencing rules defined locally (...), always use a simple reference which consists of the local rulename only." Is this enforced, or just a wise convention to follow? If I use a full (unnecessary but well-formed) reference to the local grammar by means of a URI, should I expect to get an error message? Section 2.2.2, first paragraph - I believe this is the first time the concept of a "root" rule is introduced. Please give me a pointer to the section of the document where this is defined. At this point, I'm wondering whether a root rule is unique, and how it's identified. First text paragraph under ABNF Form - Change "a parentheses" to "parentheses". The fragment separator "#" first occurs in an example here, but it isn't explained until Section 2.2.3. Please move or copy that explanation to here. Section 2.2.3 - The same symbol is referred to as "the fragment separator" under ABNF Form and "the hash separator" under XML form. I'd like to see one term used consistently. ("Fragment" seems like an odd choice for this concept. Is it used because a rule is a fragment of a grammar? I like "hash separator" better.) Section 2.2.4, first paragraph, second sentence - Change "Grammar" to either "Grammars" or "A grammar". Section 2.5, under ABNF Form - I'd like you to say something about precedence here, or at least point me forward to the Section 2.8, where it's all laid out. Under XML Form, I infer that "optional" and "?" are synonymous, but this is never stated explicitly. Section 2.6, second paragraph - The last sentence says "That language is likely to be contained within grammar tags." All and only that language? That is, when that language is contained within grammar tags, will it still be the case that they contain arbitrary strings? Under ABNF Form, the last sentence says "Alternatively, contained closing braces may be escaped with a backslash." Why only closing braces? Why not opening? What if you want to use braces as string characters? Is that what the backslashing is for? This wasn't clear. I'd like to propose that both opening and closing braces can be backslashed, and when they are, they don't count toward the balancing of braces. Unbackslashed braces must balance. Here also, I'd like you to say something about precedence, or point me to Section 2.8. Under XML Form - You list the rule expansion elements that a "tag" element may be attached to. Are there some relevant non-obvious expansion elements that a tag element CAN'T be attached to? Or is this a list of all the rule expansion elements? The first senence of Section 6.7 says tags can be attached to any rule expansion, but I'm not sure that's right. Section 2.7, second bullet point - change "Multiple language" to "Multiple languages" First paragraph under the bullet points - The last sentence is very confusing. "The referenced rule" has no antecedent, I'm not sure what "the referencing grammar" refers to, nor "the local language". I don't really think this is a hard concept, and the behavior seems to be what you'd intuitively expect, but the explanation of it needs to be much clearer. An example might help. Second paragraph under the bullet point - there's a number agreement error. I'd recommend changing "grammar constructs" to "a grammar construct" Under ABNF Form, first paragraph - change "language" to "languages" Under XML Form, first paragraph - The second sentence refers to "the scoping rules." What scoping rules? The precedence rules that we're going to see in Section 2.8? If so, tell me that. Section 2.8 - As I said earlier, don't wait for this section to introduce precedence. This section can stay as it is, but it should function as a handy summary of information that has been stated in various relevant sections previously. Section 3, first paragraph - In the second sentence, change "The rule definition is also responsible for defining the scope of the rule definition" to "The rule definition is also responsible for defining its scope" The word "pragmatics" at the end of that paragraph has a technical meaning in the field of linguistics which I don't believe is intended here. Another word would be better. Section 3.2, second paragraph, second sentence - Insert a comma after "That is" Under Issues - "public" should be in double quotes. Section 3.3, under XML Form, first paragraph - the phrase "the initial content" is unclear. This doesn't mention the documentation comment, or say that examples must come at the end of it, as the corresponding ABNF explanation does. Or is this not true for the XML Form? Section 4.1, first paragraph - The last sentence implies that the character encoding is optional within the grammar. I'd like to see this stated explicitly. Under ABNF Form - Under the text, there are what appear to be four examples; but they aren't - they're a template and three examples. This is very confusing. I think the template belongs up in the text paragraph, immediately after "of the style" Section 4.2, first paragraph - Change "The Locale" to "The locale" or else capitalize "Locale" everywhere. Insert a comma after RFC 1766 Under ABNF Form, first paragraph - I believe this is the first time the phrase "self-identifying header" is used in this document. Either define it here (or earlier), or change it to "Grammar header". (Maybe this phrase is left over from an older version of the document, and should be globally replaced with "Grammar header") Under XML Form, first paragraph - insert a comma after "convention" and another after 'the "version" attribute' Section 4.3, first paragraph, second sentence - The phrase "an alternative and optional input mode" was confusing to me. What does it mean for an input mode to be optional as well as alternative? Would you lose anything by omitting the phrase "and optional"? Under ABNF Form, first paragraph - The first sentence needs to be reworked. You could change "language declaration if present or otherwise be" to "language declaration, if one is present, or otherwise be". Even better would be "language declaration, if one is present. Otherwise it should be" As before, "self-identifying header" should be explained or changed to "Grammar header". Insert a comma after "dtmf" Section 4.4, third paragraph - Insert commas after "forms" and "the root rule". This first sentence seems to imply that speech recognizers have the power to generate grammar rules. I found this surprising. Is that what was intended? Fourth paragraph - Insert a comma after "a root rule". Under ABNF Form - As before, "self-identifying header" should be explained or changed to "Grammar header". Section 4.5, first paragraph, last sentence - insert the word "them" after "assign". Section 5.1 - The three bullet points here don't flow, because the first and third are verb phrases, while the second is a sentence. I'd recommend inserting the word "it" at the beginning of the first and third, to make all three of these sentences. This is consistent with the bullet points in Section 5.2 below. Also, move the parenthetical in the first bullet point to the end - "it is well-formed (relative to XML)" Maybe that last phrase doesn't even need to be in parentheses. In the second bullet point, you refer to the xmlns attributes. I'm new to XML, and wasn't familiar with this term. I spent a fair amount of time looking through the document for a description of this, thinking that it must be part of the W3C Speech Recognition Grammar Specification that I'd read about and forgotten. Finally someone else here in the lab told me that this is a well-known part of XML. For those like me who may not be as familiar with XML, could you include a quick explanation of this, and a pointer to where the read could learn more? In the paragraph following the bullet points, I think "and" is a better conjunction than "or" to connect the language and criteria. Section 5.4, fifth paragraph - you want a semi-colon before "that is" Under Issues, second bullet point - change "shoud" to "should" Third bullet point - delete comma after "language/locale". Insert comma after "one or many grammars" Fourth bullet point - Insert comma after "one or many grammars" Insert comma after "single language and locale" Change "or at least on of" to "or at least one of" Insert comma after "concurrently" Insert comma after "processor" Fifth bullet point - change period after "behavior" to a semi-colon, and uncapitalize "for" Section 5.6, third paragraph, last sentence - captialize "Issues" Fourth paragraph - change the comma after "recursive grammars" to a semi-colon. Section 6.1, second paragraph - change "an Natural Language" to "a Natural Language" Third paragraph, first sentence - change "for spoken input sentence" to either "for spoken input sentences" or "for a spoken input sentence". Third bullet point, second sentence - insert "the" before "W3C" Next sentence - Do you need both "regarded and treated"? I'd recommend picking one or the other. Section 6.2, second paragraph - "behaviour" is a British spelling, and "behavior" is used elsewhere in this document. I'd recommend using one or the other consistently. I had a hard time figuring out what the example under the third paragraph is supposed to show. I'm new to all this, but this example looks to me like a mix of ABNF and XML. Which is it? It wasn't even clear until the following paragraph that the first line is the right-hand side of some grammar rule. I think you need to tell more about what I should expect to see in this example before you throw me into it. Next to last paragraph, last sentence - Insert commas after "reason" and "above". Section 6.4, first paragraph, first sentence - change comma after "documents" to a semi-colon. In the line under "For the ABNF form," should an ABNF form really have </grammar> at the end? This looks like XML to me. In the example under "For the XML form" - Same comment as above regarding an explanation of xmlns. Section 6.5, first paragraph, second sentence - Insert comma after "character set". Second paragraph, last sentence - Insert commas after "e.g." and "alphabets" In the example after "For XML", I believe you want to insert the word "phoneme" between "The" and "attribute". Section 6.6, second paragraph, first sentence - Insert comma after "developed". Third paragraph - Insert comma after "of a grammar". Section 6.7, first paragraph - Is it true that a tag can be attached to any rule expansion? It can't be attached to an unparenthesized sequence, can it? Granted, that's ruled out by the precedence rules, not by restrictions on tag attachment, but even so... First bullet point - Change "that" to "than". That's all for now. Sorry if this was more verbose and picky than you were looking for. Dale Russell
Received on Monday, 15 January 2001 10:28:41 UTC