RE: C0 control characters and Regular expressions

> 
> Do regular expressions of W3C XML Schema Part 2 allow C0 
> control characters?  For example, does every C0 control 
> character match \p{IsBasicLatin}  ?
> 

I think the regular expressions are well-defined over the whole of Unicode,
but of course other specifications restrict the use of C0 in XML. XML 1.0
allows only TAB, CR, and NL, while XML 1.1 allows all of C0 except NUL. It
looks fairly clear to me that those C0 characters that you can actually get
into the system will match \p{IsBasicLatin} - for the others, the question
is academic.

My interpretation.

Michael Kay
http://www.saxonica.com/

Received on Thursday, 15 November 2007 11:08:10 UTC