RE: Numeric character references in equivalent character classes in CR-xmlschema-2-20001024 from Biron,Paul V on 2000-12-05 (www-xml-schema-comments@w3.org from October to December 2000)

From: Biron,Paul V <Paul.V.Biron@kp.org>
Date: Tue, 5 Dec 2000 12:18:54 -0800
To: "'Tony Graham'" <tgraham@mulberrytech.com>, www-xml-schema-comments@w3.org
Message-Id: <376E771642C1D2118DC300805FEAAF43014BA82F@pars-exch-1.ca.kp.org>

> -----Original Message-----
> From:	Tony Graham [SMTP:tgraham@mulberrytech.com]
> Sent:	Friday, November 24, 2000 12:41 PM
> To:	www-xml-schema-comments@w3.org
> Subject:	Numeric character references in equivalent character classes
> in CR-xmlschema-2-20001024
> 
> The equivalent character classes for two of the multi-character
> escapes in CR-xmlschema-2-20001024 use a numeric character references
> that is not supported by the CR's regular expression syntax.  The
> character classes in the CR are '[#x20\t\n\r]' and
> '[#x0000-#x10FFFF]-[\p{P}\p{S}\p{C}]'.
> 
> The CR states that in a regular expression a normal character can be
> represented by itself or by a character reference, with a link to the
> character reference definition in XML 1.0 2ed.
> 
> Either the character references in the equivalent character classes
> should be correct character references or the regular expression
> syntax should be expanded to include character references of the form
> used in the equivalent character classes.
> 
This is a tough one.  The original intent of that "equiv char class" column
was not to give the actual schema regex that the multi-character escape
expanded to, but rather to identy the set of UCS code points that it
expanded to (hence, the entry for "the set of characters matched by NameChar
in XML 1.0" for \c).  But, it appears that appears that many people are
interpreting the column as being the syntax equivalent (and that is very
understandable).

So, I think you are correct, I will change those code point references to
XML character references, hence

	\s		[&#x20;\t\n\r]
	\w		[&#x0;-&#x10ffff;-[\p{P}\p{S}\p{C}]]

(note, the typo correct in \w's expansion, as noted in my answer to James'
message to this list this morning [1]).

This also suggests that I should write the expansion of \c in terms of
\p{L}, \p{Nl}, etc.  Do you agree?  I've never figured out exactly what
values for the general category correspond to the NameChar production of XML
1.0.  Has anyone already done so and can save me the trouble (note: the
bullets at the end of Appendix B in 1.0 are a start but not the final
solution).

pvb

References
[1]
http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000OctDec/0383.
html

Received on Tuesday, 5 December 2000 15:39:50 UTC