W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > July to September 2004

Final responses to 2e comments (was Re: Protest the \- change (E2-18))

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: 13 Jul 2004 19:06:07 -0600
To: Bob Foster <bob@objfac.com>
Cc: W3C XML Schema Comments list <www-xml-schema-comments@w3.org>
Message-Id: <1089767166.2569.271.camel@localhost>
Dear Bob Foster,

This is a (rather late) response to your notes of 7
April [1] and 24 March [2] regarding erratum E2-18 and its
incorporation into the Proposed Edited Recommendation
of XML Schema 1.0 Second Edition.


The Working Group has discussed the topic several times, 
weighing against each other factors of simplicity, grammatical
cleanliness, clarity, behavior of current implementations, 
and so on.

You will perhaps be pleased to know that the Working
Group has concluded these discussions by deciding to
roll back the part of E2-18 which caused the incompatibility
you objected to (while retaining the part of E2-18 which
fixed an unrelated problem).

Attached you will find the text of erratum E2-67, which
rolls back the part of E2-18 which deals with R-69 (hyphen 
rule), but not the part which deals with R-30 (by eliminating
character references from the grammar).  The result is that the
relevant bits of 2E will read something like this:

    Character Range

    [17] charRange ::= seRange 
                     | XmlCharRef 
                     | XmlCharIncDash 
    [18] seRange ::= charOrEsc '-' charOrEsc
    [19] XmlCharRef ::= ('&#' [0-9]+ ';') 
                      | (' &#x' [0-9a-fA-F]+ ';')
    [20] charOrEsc ::= XmlChar | SingleCharEsc
    [21] XmlChar ::= [^\#x2D#x5B#x5D]
    [22] XmlCharIncDash ::= [^\#x5B#x5D]

And the bulleted list following the first paragraph after 
the table in which Productions 17 & 22 occur will look like 

    A single XML character is a ·character range· that identifies the
    set of characters containing only itself. All XML characters are
    valid character ranges, except as follows:

      * The [, ], - and \ characters are not valid character ranges;

      * The ^ character is only valid at the beginning of a positive
      * character group if it is part of a ·negative character group·

      * The - character is a valid character range only at the
      * beginning or end of a ·positive character group·.

    NOTE: The grammar for charRange as given above is ambiguous, but
    the second and third bullets above together remove the ambiguity.

This means the following changes vis-a-vis erratum E2-18:

  (1) Do not insert XmlChar on the right hand side of 17. 
  (2) Do not delete XmlCharIncDash from the RHS of 17.
  (3) Do not delete 22.
  (4) Do not delete the third bullet item.
  (5) Insert the note shown above.

Please let us know, preferably within a week (i.e. by 20 July)
if this is a satisfactory resolution of your comment.

best regards,

-C. M. Sperberg-McQueen
 for the XML Schema WG

On Wed, 2004-04-07 at 16:17, Bob Foster wrote:
> I previously copied this address on the subject but on 4/3/2004 Henry 
> Thompson suggested I write a protest, even though the Errata seem to 
> have been closed as of 3/16/2004. I take the latter as an indication my 
> previous mail didn't do the job.
> The proposed change E2-18 unnecessarily introduces an incompatible 
> change to the regular expression language accepted by patterns. This 
> breaks a number of existing published schemas, including 
> http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd and 
> http://java.sun.com/dtd/jspxml.xsd.
> The original problem reported is that the language in F.1 "The - 
> character is a valid character range only at the beginning or end of a 
> ·positive character group" contradicted the published grammar. The 
> public record doesn't say so, but a further problem was that the 
> published grammar was ambiguous in its treatment of patterns like "a-z", 
> which could be interpreted as either one seRange or three 
> XMLCharIncDash, and in fact, the pattern "---" was allowed by the 
> grammar (- could appear anywhere).
> There is an issue, but it should not be resolved by an incompatible 
> change. Instead, the issue could be resolved by an Error that simply 
> struck out the offending sentence quoted above, amended the grammar as 
> shown below (to remove the character references already handled by the 
> parser) and added a Clarification along the following lines:
> [17]   	charRange	   ::=   	 seRange | XmlCharIncDash  	
> [18]   	seRange	   ::=   	charOrEsc '-' charOrEsc	
> [20]   	charOrEsc	   ::=   	XmlChar | SingleCharEsc	
> [21]   	XmlChar	   ::=   	[^\#x2D#x5B#x5D]	
> [22]   	XmlCharIncDash	   ::=   	[^\#x5B#x5D]	
> "Clarification. The grammar for posCharGroup is ambiguous in that any 
> seRange could also be interpreted as a sequence of three XMLCharIncDash. 
> The ambiguity is to be resolved in favor of seRange, such that any 
> three-character sequence where the first and third character are not one 
> of #x2D, #x5B or #x5D ('-', '[' or ']') and the second character is a 
> '-' is to be considered an seRange. This requires more than one token 
> lookahead."
> The result would not unduly tax processors, as this was the only 
> sensible interpretation of the grammar prior to the errata, and it would 
> not break any existing documents (either pre- or post-errata).
> Bob Foster
> http://xmlbuddy.com/

Received on Tuesday, 13 July 2004 21:06:44 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:50:02 UTC