- From: <bugzilla@wiggum.w3.org>
- Date: Thu, 14 Jan 2010 12:38:59 +0000
- To: www-xml-schema-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=8744 Summary: Regex characters classes C, L, M, etc Product: XML Schema Version: 1.0/1.1 both Platform: PC OS/Version: Windows NT Status: NEW Severity: normal Priority: P2 Component: Datatypes: XSD Part 2 AssignedTo: David_E3@VERIFONE.com ReportedBy: mike@saxonica.com QAContact: www-xml-schema-comments@w3.org CC: cmsmcq@blackmesatech.com The specification states: <quote> [Definition:] [Unicode Database] specifies a number of possible values for the "General Category" property and provides mappings from code points to specific character properties. The set containing all characters that have property X, can be identified with a category escape \p{X} . The complement of this set is specified with the category escape \P{X} . ( [\P{X}] = [^\p{X}] ). </quote> It then gives a table purporting to show the values of "General Category" that occur in Unicode 5.1. This includes single-character categories such as "C", "L", and "M". As far as I can see, however, Unicode only defines the two-character categories such as Ll, Lu, Mc and so on. The single-character categories are an invention of the regex language, and therefore need to be described in our specification, rather than by reference to Unicode. There are two possible definitions of these categories, which give different results. At least one XML Schema implementation has interpreted the single-character category X to be the union of all two-character categories starting with X, for example C is the union of (Cc, Cf, Co, and Cn). However, another interpretation (the one used by the Java regex library) is that it is the set of all characters listed in the Unicode database as belonging to a category starting with that letter. This gives a different result in the case of category C, since Cn is the set of characters that are not listed in the relevant section of the Unicode database. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Thursday, 14 January 2010 12:39:00 UTC