- From: <bugzilla@wiggum.w3.org>
- Date: Fri, 16 Oct 2009 16:53:47 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7935
Summary: [F&O] normalize-unicode on codepoints that are not
characters.
Product: XPath / XQuery / XSLT
Version: 2nd Edition Recommendation
Platform: PC
OS/Version: Windows NT
Status: NEW
Severity: normal
Priority: P2
Component: Functions and Operators
AssignedTo: mike@saxonica.com
ReportedBy: oliver@cbcl.co.uk
QAContact: public-qt-comments@w3.org
The behaviour normalize-unicode is defined by the unicode normalization
specification.
Based on my (somewhat woolly) understanding of the unicode specification, there
are 66 codepoints that do not map to characters, and unicode normalization is
only defined on strings of characters. Although use of these is not
recommended, they are valid XML characters.
xs:string contains a string of codepoints, which can quite happily include
noncharacters.
For example what should happen with the following query?
normalize-string("", "NFC")
It is worth noting that in .NET, the following expression throws an exception:
"\ufdd0".Normalize(NormalizationForm.FormC)
I am somewhat loathe to catching this exception and adding a workaround when it
is clear that these characters are a bad thing.
Perhaps it is worth allowing implementations to raise an error if these
characters appear in a string that is to be normalized, as the result is not a
valid unicode string.
On a similar note, Constr-cont-document-3 has some of these characters in its
expected result, and I believe that canonicalization is not defined on these
characters for a similar reason.
--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 16 October 2009 16:53:48 UTC