- From: Chris King <chris.king@senet.com.au>
- Date: Mon, 3 Feb 2003 23:01:59 -0500 (EST)
- To: xml-editor@w3.org
Dear XML specification editors, I have been looking through the XML 1.0 2nd edition recommendation (6-Oct-2000) and its associated errata (up to E41 as of 2002-09-18). I have come across an error in the EBNF production [65] and wish also to vote on some style changes to some other productions. ----------------------AAAA---------------------- The current production [65] states: Ignore ::= Char* - (Char* ('<![' | ']]>') Char*) ...which is in error because it is equivalent to: Ignore ::= ( Char* - (Char* '<![' Char*) ) | ( Char* - (Char* ']]>' Char*) ) This represents all sequences of characters excluding those sequences that contain both the string '<![' AND the string ']]>'. I'm sure that you intended it to represent all sequences of characters excluding those containing the string '<![' OR the string ']]>' OR both. My suggested replacement is: Ignore ::= Char* - (Char* '<![' Char*) - (Char* ']]>' Char*) I could probably come up with some formal set-algebra to prove this really is an error if needs be. ----------------------BBBB---------------------- The current production [15] (well-supported by surrounding text) states: Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->' But, if it were restated as: Comment ::= '<!--' ( (Char* - (Char* '--' Char*)) (Char - '-') )? '-->' ...then the expression patterning would be more consistent with other character strings that exclude double-character sequences, and the the need for a non-hyphen character just before the '-->' terminator is more easily deduced. ----------------------CCCC---------------------- Several productions make use of the notation for a set of characters with some forbidden members: [^abc] ...as described in section 6 - Notation. It is in this section that there is the only link between the `forbidden' notation's global set of characters (from which nominated characters are excluded), and the production [2] Char. I guess you could say this is an obvious association, but I feel that its use is unnecessary and inconsistent with other exclusion-style notations. Here's a vote for some editorial changes. First up, the introduction of another symbol makes the definition of [14] CharData crystal clear: [14] CharData ::= DataChar* - (DataChar* ']]>' DataChar*) [14a] DataChar ::= Char - '<' - '&' ...and to complete the edits: [9] EntityValue ::= '"' ( (Char - '%' - '&' - '"') | PEReference | Reference )* '"' | "'" ( (Char - '%' - '&' - "'") | PEReference | Reference )* "'" [10] AttValue ::= '"' ( (DataChar - '"') | Reference )* '"' | "'" ( (DataChar - "'") | Reference )* "'" [11] SystemLiteral ::= '"' (Char - '"')* '"' | "'" (Char - "'")* "'" The documented notation [^a-z],[^#xN-#xN] is unnecessary as it isn't used (and neither should [^abc] ;-). ----------------------DDDD---------------------- There is an extra set of parentheses in [20] CData that aren't doing anything, and two extra sets in [11] SystemLiteral that aren't doing much (adjacent productions seem to manage without them). ----------------------EEEE---------------------- There aren't any constraints given on [10] AttValue, but several in each of the two places that it is used ([41] Attribute and [60] DefaultDecl). Is it appropriate to tie common contraints onto [10]? [WFC: No < in Attribute Values] is clearly in common, [VC: Attribute Value Type] and [VC: Attribute Default Legal] ALMOST say the same thing. (The former requires a declaration, already happening in the latter.) Why doesn't [WFC: No External Entity References] apply to [60]? EVEN MORE GENERALLY... Is it appropriate to make reference to [6] Names and [8] Nmtokens here? (Original definitions -- before the `re-application' erratum E20.) I really don't see the point of having [6] and [8] as the only unreferenced non-root productions the grammar (where the other un-referenced roots are [1] document, [30] extSubset, and [78] extParsedEnt). The rest of the grammar is a lexical description of what is allowed, all other restrictive semantics that are the responsibility of the XML processor are listed as constraints. As it is, [10] might be read as letting it all through anyway. -------------------------------------------- Well that's it. I'm sorry if I got a bit carried away. I realise that lots (if not all) of this is probably useless as far as needing any real action, but like we're always told, lobbying the politicians does eventually have some effect. Besides, whining alone isn't healthy. With regards, Chris King (Sun Certified Programmer for Java 2 Platform 1.4) Longwood, South Australia
Received on Tuesday, 4 February 2003 10:44:58 UTC