Re: Fwd: Is there a tool which tells me if my XML is "fully normalized"?

Norman Walsh scripsit:

> This element name is encoded with four code points: càn
>
> That's a "c", an "a", a "combining grave accent, U+0300", and an "n".
>
> Saxon reports string-length(local-name(.)) is 4.

Rightly so.

> MarkLogic reports 3 because the two-code-point version of the accented
> a is replaced with a single U+00E0 character.

When you pull the whole document out, does it have the combining
character?  If so, the MarkLogic XSLT implementation is broken.  If not,
then the storage subsystem is not disgorging the same XML document that
it swallowed, but one that is -- in the Unicode sense -- canonically
equivalent to it.  Of course, there is nothing prohibiting this in either
the XML or the XSLT spec.

-- 
"The serene chaos that is Courage, and the phenomenon   cowan@ccil.org
of Unopened Consciousness have been known to the        John Cowan
Great World eons longer than Extaboulism."
"Why is that?" the woman inquired.
"Because I just made that word up", the Master said wisely.
        --Kehlog Albran, The Profit             http://www.ccil.org/~cowan

Received on Thursday, 21 February 2013 20:46:32 UTC