- From: Bob Foster <bob@objfac.com>
- Date: Wed, 07 Apr 2004 17:17:42 -0500
- To: www-xml-schema-comments@w3.org
I previously copied this address on the subject but on 4/3/2004 Henry Thompson suggested I write a protest, even though the Errata seem to have been closed as of 3/16/2004. I take the latter as an indication my previous mail didn't do the job. The proposed change E2-18 unnecessarily introduces an incompatible change to the regular expression language accepted by patterns. This breaks a number of existing published schemas, including http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd and http://java.sun.com/dtd/jspxml.xsd. The original problem reported is that the language in F.1 "The - character is a valid character range only at the beginning or end of a ·positive character group" contradicted the published grammar. The public record doesn't say so, but a further problem was that the published grammar was ambiguous in its treatment of patterns like "a-z", which could be interpreted as either one seRange or three XMLCharIncDash, and in fact, the pattern "---" was allowed by the grammar (- could appear anywhere). There is an issue, but it should not be resolved by an incompatible change. Instead, the issue could be resolved by an Error that simply struck out the offending sentence quoted above, amended the grammar as shown below (to remove the character references already handled by the parser) and added a Clarification along the following lines: [17] charRange ::= seRange | XmlCharIncDash [18] seRange ::= charOrEsc '-' charOrEsc [20] charOrEsc ::= XmlChar | SingleCharEsc [21] XmlChar ::= [^\#x2D#x5B#x5D] [22] XmlCharIncDash ::= [^\#x5B#x5D] "Clarification. The grammar for posCharGroup is ambiguous in that any seRange could also be interpreted as a sequence of three XMLCharIncDash. The ambiguity is to be resolved in favor of seRange, such that any three-character sequence where the first and third character are not one of #x2D, #x5B or #x5D ('-', '[' or ']') and the second character is a '-' is to be considered an seRange. This requires more than one token lookahead." The result would not unduly tax processors, as this was the only sensible interpretation of the grammar prior to the errata, and it would not break any existing documents (either pre- or post-errata). Bob Foster http://xmlbuddy.com/
Received on Wednesday, 7 April 2004 18:17:40 UTC