RE: Numeric character references in equivalent character classes in CR-xmlschema-2-20001024 from Tony Graham on 2000-12-05 (www-xml-schema-comments@w3.org from October to December 2000)

From: Tony Graham <tgraham@mulberrytech.com>
Date: Tue, 5 Dec 2000 17:49:34 -0400 (EST)
To: <www-xml-schema-comments@w3.org>
Message-ID: <14893.25326.369000.721667@menteith.com>

At 5 Dec 2000 15:46 -0500, Matt Timmermans wrote:
 > > -----Original Message-----
 > > From: www-xml-schema-comments-request@w3.org
 > >
 > > So, I think you are correct, I will change those code point
 > > references to
 > > XML character references, hence
 > >
 > > 	\s		[&#x20;\t\n\r]
 > > 	\w		[&#x0;-&#x10ffff;-[\p{P}\p{S}\p{C}]]
 > >
 > > (note, the typo correct in \w's expansion, as noted in my
 > > answer to James'
 > > message to this list this morning [1]).
 > 
 > I believe the problem was that those character references aren't XML chars,
 > i.e.:
 > 
 > \w [&#x9;-&#x10FFFF;-[\p{P}\p{S}\p{C}]]

No, Paul is correct: my problem was that the equivalent character
classes did not use correct regular expression syntax.

The equivalent character class could also be:

\w [&#x20;-&#x10FFFF;-[\p{P}\p{S}\p{C}]]

The interesting question about the equivalent character class is
whether or not it excludes code points from the Surrogate block.
Since the "Cs" value of the General Category field of the Unicode
Character Database is not listed in the table of character classes in
the CR, does \p{C} really include the Surrogate code points?  The
answer is probably that it doesn't have to, since surrogates 'do not
occur at the level of "character abstraction" that XML instance
documents operate on.'

Regards,

Tony Graham
======================================================================
Tony Graham                            mailto:tgraham@mulberrytech.com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9632
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Received on Tuesday, 5 December 2000 17:54:22 UTC