- From: <bugzilla@wiggum.w3.org>
- Date: Fri, 16 Oct 2009 16:53:47 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7935 Summary: [F&O] normalize-unicode on codepoints that are not characters. Product: XPath / XQuery / XSLT Version: 2nd Edition Recommendation Platform: PC OS/Version: Windows NT Status: NEW Severity: normal Priority: P2 Component: Functions and Operators AssignedTo: mike@saxonica.com ReportedBy: oliver@cbcl.co.uk QAContact: public-qt-comments@w3.org The behaviour normalize-unicode is defined by the unicode normalization specification. Based on my (somewhat woolly) understanding of the unicode specification, there are 66 codepoints that do not map to characters, and unicode normalization is only defined on strings of characters. Although use of these is not recommended, they are valid XML characters. xs:string contains a string of codepoints, which can quite happily include noncharacters. For example what should happen with the following query? normalize-string("", "NFC") It is worth noting that in .NET, the following expression throws an exception: "\ufdd0".Normalize(NormalizationForm.FormC) I am somewhat loathe to catching this exception and adding a workaround when it is clear that these characters are a bad thing. Perhaps it is worth allowing implementations to raise an error if these characters appear in a string that is to be normalized, as the result is not a valid unicode string. On a similar note, Constr-cont-document-3 has some of these characters in its expected result, and I believe that canonicalization is not defined on these characters for a similar reason. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Friday, 16 October 2009 16:53:48 UTC