- From: Bob Foster <bob@objfac.com>
- Date: Wed, 07 Apr 2004 17:17:42 -0500
- To: www-xml-schema-comments@w3.org
I previously copied this address on the subject but on 4/3/2004 Henry
Thompson suggested I write a protest, even though the Errata seem to
have been closed as of 3/16/2004. I take the latter as an indication my
previous mail didn't do the job.
The proposed change E2-18 unnecessarily introduces an incompatible
change to the regular expression language accepted by patterns. This
breaks a number of existing published schemas, including
http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd and
http://java.sun.com/dtd/jspxml.xsd.
The original problem reported is that the language in F.1 "The -
character is a valid character range only at the beginning or end of a
·positive character group" contradicted the published grammar. The
public record doesn't say so, but a further problem was that the
published grammar was ambiguous in its treatment of patterns like "a-z",
which could be interpreted as either one seRange or three
XMLCharIncDash, and in fact, the pattern "---" was allowed by the
grammar (- could appear anywhere).
There is an issue, but it should not be resolved by an incompatible
change. Instead, the issue could be resolved by an Error that simply
struck out the offending sentence quoted above, amended the grammar as
shown below (to remove the character references already handled by the
parser) and added a Clarification along the following lines:
[17] charRange ::= seRange | XmlCharIncDash
[18] seRange ::= charOrEsc '-' charOrEsc
[20] charOrEsc ::= XmlChar | SingleCharEsc
[21] XmlChar ::= [^\#x2D#x5B#x5D]
[22] XmlCharIncDash ::= [^\#x5B#x5D]
"Clarification. The grammar for posCharGroup is ambiguous in that any
seRange could also be interpreted as a sequence of three XMLCharIncDash.
The ambiguity is to be resolved in favor of seRange, such that any
three-character sequence where the first and third character are not one
of #x2D, #x5B or #x5D ('-', '[' or ']') and the second character is a
'-' is to be considered an seRange. This requires more than one token
lookahead."
The result would not unduly tax processors, as this was the only
sensible interpretation of the grammar prior to the errata, and it would
not break any existing documents (either pre- or post-errata).
Bob Foster
http://xmlbuddy.com/
Received on Wednesday, 7 April 2004 18:17:40 UTC