Re: Regular expressions should support \x{....} escapes

At 08:28 06/01/21, wrote:

[Bjoern wrote:]

 >> To exclude e.g. code points designated for private use in Perl would be
 >> [^\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{10000}-\x{10FFFD}]. To express
 >> this with the regular expression format in XML Schema 1.0 one would have
 >> to use private use code points which one should not per W3C's character
 >> model
 >I do not believe it would be a violation of the character model to refer
 >to private use code points in this context.

Definitely not. Even the Character Model itself uses the relevant
numbers to say exactly what is excluded. The Character Model makes
it perfectly clear why private use code points SHOULD not be used;
using the boundary codepoints in an expression explicitly to make
sure private codepoints are excluded is the most benign and positive
use of private codepoints one can immagine. The boundaries
of the private use areas change by private agreement.

 >C073 [1] is written as a
 >SHOULD NOT, not as a MUST NOT, and I believe that use of these code points
 >in a schema regex is perfectly appropriate, especially given C040 [2].

C040 is not very relevant here. Its purpose is to make sure that
W3C technology does not disallow the exchange of private use
codepoints in truely private exchanges, i.e. with private agreements.
For the regular expression above no private agreement is needed.

 >>The effect is that this design discourages sharing regular expressions,
 >>developers have to be aware of these subtle problems and convert between
 >>them by adding and subtracting character ranges, which is not unlikely
 >>to either introduce errors or persuade schema authors to use incorrect
 >>expressions so as to not depend on XML 1.1 support of schema validators
 >>(or query processors, or whatever invokes the engine).

I think Bjoern has a very valid point here.

Regards,    Martin.


Received on Tuesday, 24 January 2006 00:45:09 UTC