- From: John Cowan <cowan@mercury.ccil.org>
- Date: Thu, 21 Feb 2013 15:46:05 -0500
- To: Norman Walsh <ndw@nwalsh.com>
- Cc: public-xml-core-wg@w3.org
Norman Walsh scripsit: > This element name is encoded with four code points: càn > > That's a "c", an "a", a "combining grave accent, U+0300", and an "n". > > Saxon reports string-length(local-name(.)) is 4. Rightly so. > MarkLogic reports 3 because the two-code-point version of the accented > a is replaced with a single U+00E0 character. When you pull the whole document out, does it have the combining character? If so, the MarkLogic XSLT implementation is broken. If not, then the storage subsystem is not disgorging the same XML document that it swallowed, but one that is -- in the Unicode sense -- canonically equivalent to it. Of course, there is nothing prohibiting this in either the XML or the XSLT spec. -- "The serene chaos that is Courage, and the phenomenon cowan@ccil.org of Unopened Consciousness have been known to the John Cowan Great World eons longer than Extaboulism." "Why is that?" the woman inquired. "Because I just made that word up", the Master said wisely. --Kehlog Albran, The Profit http://www.ccil.org/~cowan
Received on Thursday, 21 February 2013 20:46:32 UTC