W3C home > Mailing lists > Public > public-qt-comments@w3.org > October 2009

[Bug 7935] New: [F&O] normalize-unicode on codepoints that are not characters.

From: <bugzilla@wiggum.w3.org>
Date: Fri, 16 Oct 2009 16:53:47 +0000
To: public-qt-comments@w3.org
Message-ID: <bug-7935-523@http.www.w3.org/Bugs/Public/>

           Summary: [F&O] normalize-unicode on codepoints that are not
           Product: XPath / XQuery / XSLT
           Version: 2nd Edition Recommendation
          Platform: PC
        OS/Version: Windows NT
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Functions and Operators
        AssignedTo: mike@saxonica.com
        ReportedBy: oliver@cbcl.co.uk
         QAContact: public-qt-comments@w3.org

The behaviour normalize-unicode is defined by the unicode normalization

Based on my (somewhat woolly) understanding of the unicode specification, there
are  66 codepoints that do not map to characters, and unicode normalization is
only defined on strings of characters.  Although use of these is not
recommended, they are valid XML characters.

xs:string contains a string of codepoints, which can quite happily include

For example what should happen with the following query?

normalize-string("&#xfdd0;", "NFC")

It is worth noting that in .NET, the following expression throws an exception:


I am somewhat loathe to catching this exception and adding a workaround when it
is clear that these characters are a bad thing.

Perhaps it is worth allowing implementations to raise an error if these
characters appear in a string that is to be normalized, as the result is not a
valid unicode string.

On a similar note, Constr-cont-document-3 has some of these characters in its
expected result, and I believe that canonicalization is not defined on these
characters for a similar reason.

Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 16 October 2009 16:53:48 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:57:28 UTC