Re: Unicode (compatibility) normalization and MicroXML processors

I am reluctant to say James Clark is wrong, but i agree w/u.  pls tell
schemers i had emerg foot surg fri, followup surg on tues, prospects good,
tx.

On Mon, Jan 24, 2022 at 3:47 AM Daphne Preston-Kendal <dpk@nonceword.org>
wrote:

> The first example of a µXML document given in the spec is
>
> <comment lang="en" date="2012-09-11">
> I <em>love</em> &#xB5;<!-- MICRO SIGN -->XML!<br/>
> It's so clean &amp; simple.</comment>
>
> with the JSON equivalent
>
> [ "comment",
>   {  "date": "2012-09-11", "lang": "en" },
>   [ "\nI ",
>     ["em", {}, ["love"]],
>     " \u03BCXML!",
>     ["br", {}, []],
>     "\nIt's so clean & simple."
>   ]
> ]
>
> The mapping of U+00B5 to U+03BC implies that µXML processors
> can or should do compatibility normalization of their input,
> but this is not actually explicitly stated anywhere. In fact,
> it appears to contradict the recommendation
>
> > [Unicode] says that canonically equivalent sequences of characters ought
> to be treated as identical. However, documents that are canonically
> equivalent according to Unicode but that use distinct code point sequences
> are considered distinct by MicroXML parsers. This gives rise to the
> possibility that the user might unintentionally create sequences of
> characters that are canonically equivalent but are treated as distinct by
> MicroXML parsers. To avoid this possibility, all documents SHOULD be in
> Normalization Form C as described by [Unicode].
>
> which seems to say that parsers should *not* do any normalization.
> (Also consider that U+00B5 is unaffected by non-compatibility
> normalization.)
>
> Is this an error in the spec (in that example)?
>
> --
> dpk (Daphne Preston-Kendal) ·· 12107 Berlin, Germany ·· http://dpk.io/
> ‘What’s the good of Mercator’s North Poles and Equators,
>    Tropics, Zones, and Meridian Lines?’
>  So the Bellman would cry: and the crew would reply
>   ‘They are merely conventional signs!’ — Carroll, Hunting of the Snark
>
>

Received on Monday, 24 January 2022 14:58:56 UTC