W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > April to June 2008

[Bug 5818] New: Unicode Database: shifting sands

From: <bugzilla@wiggum.w3.org>
Date: Fri, 27 Jun 2008 20:21:23 +0000
To: www-xml-schema-comments@w3.org
Message-ID: <bug-5818-703@http.www.w3.org/Bugs/Public/>


           Summary: Unicode Database: shifting sands
           Product: XML Schema
           Version: 1.1 only
          Platform: PC
        OS/Version: Windows NT
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Datatypes: XSD Part 2
        AssignedTo: cmsmcq@w3.org
        ReportedBy: mike@saxonica.com
         QAContact: www-xml-schema-comments@w3.org

There is a Note in G.1.1:

Note: [Unicode Database] is subject to future revision.  For example, the
mapping from code points to character properties might be updated. All
ˇminimally conformingˇ processors ˇmustˇ support the character properties
defined in the version of [Unicode Database] cited in the normative references
(Normative (§K.1)).  However, implementors are encouraged to support the
character properties defined in any future version.

I'm not sure that it is possible to do both. In Unicode 3.1, and therefore in
XML Schema 1.0, the Ethiopic digits x1369-x1371 were in group Nd (and therefore
matched \d). In Unicode 4.1 they have been moved to group No (so they no longer
match \d). A given processor, unless it has configuration options to put this
under user control -- which seems unduly onerous -- is either going to support
the new version or the old. In one case, x1369 will match \d, in the other case
it won't. In practice, it's quite likely to depend on which version of Java or
.NET you are using. So I think we should either pin things down so processors
are required to support Unicode version 4.1 and no other, or we should remove
the "must" from the above note, and make it implementation-defined which
version of Unicode is used. 

(In any case, what is a "must" doing in a Note?)

Test case reS17 in the Microsoft regex test suite is relevant: its results
depend on which version of Unicode you believe in.

Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 27 June 2008 20:22:00 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:50:08 UTC