Re: Regular expressions should support \x{....} escapes

> RL1.1 requires provisions to refer to any Unicode code point. XML Schema
> requires however to rely on external provisions to refer to characters,
> which in case of XML 1.0 means e.g. U+0001 cannot be referred to, and in
> case of XML 1.1 e.g. U+FFFE cannot be referred to. Other formats likely
> have similar restrictions.

Well, I think we decided that we satisfied the spirit of RL1.1 since there 
is really no need to refer to a code point in a schema regex if that code 
point can't appear in an XML document.  And 5 years out I am still happy 
with that decision.

> To exclude e.g. code points designated for private use in Perl would be
> [^\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{10000}-\x{10FFFD}]. To express
> this with the regular expression format in XML Schema 1.0 one would have
> to use private use code points which one should not per W3C's character
> model

I do not believe it would be a violation of the character model to refer 
to private use code points in this context. C073 [1] is written as a 
SHOULD NOT, not as a MUST NOT, and I believe that use of these code points 
in a schema regex is perfectly appropriate, especially given C040 [2].

However, I think you raise some interesting questions about XML 1.0 vs. 
XML 1.1.  Of course, Schema 1.0 only applies to XML 1.0...so in that 
respect those issues don't come up for current processors.  As the WG (of 
which I am no longer a member) addresses the XML 1.0 vs XML 1.1 issue as 
it moves forward with Schema 1.1 I hope they do take some of your 
inter-version issues to heart (as I know they are struggling with other 
XML 1.1 issues).

pvb

[1] http://www.w3.org/TR/charmod/#C073
[2] http://www.w3.org/TR/charmod/#C040

Received on Friday, 20 January 2006 23:29:06 UTC