Unicode (compatibility) normalization and MicroXML processors

The first example of a µXML document given in the spec is

<comment lang="en" date="2012-09-11">
I <em>love</em> &#xB5;<!-- MICRO SIGN -->XML!<br/>
It's so clean &amp; simple.</comment>

with the JSON equivalent

[ "comment",
  {  "date": "2012-09-11", "lang": "en" },
  [ "\nI ",
    ["em", {}, ["love"]],
    " \u03BCXML!",
    ["br", {}, []],
    "\nIt's so clean & simple."
  ]
]

The mapping of U+00B5 to U+03BC implies that µXML processors
can or should do compatibility normalization of their input,
but this is not actually explicitly stated anywhere. In fact,
it appears to contradict the recommendation

> [Unicode] says that canonically equivalent sequences of characters ought to be treated as identical. However, documents that are canonically equivalent according to Unicode but that use distinct code point sequences are considered distinct by MicroXML parsers. This gives rise to the possibility that the user might unintentionally create sequences of characters that are canonically equivalent but are treated as distinct by MicroXML parsers. To avoid this possibility, all documents SHOULD be in Normalization Form C as described by [Unicode].

which seems to say that parsers should *not* do any normalization.
(Also consider that U+00B5 is unaffected by non-compatibility
normalization.)

Is this an error in the spec (in that example)?

-- 
dpk (Daphne Preston-Kendal) ·· 12107 Berlin, Germany ·· http://dpk.io/
‘What’s the good of Mercator’s North Poles and Equators,
   Tropics, Zones, and Meridian Lines?’
 So the Bellman would cry: and the crew would reply
  ‘They are merely conventional signs!’ — Carroll, Hunting of the Snark

Received on Monday, 24 January 2022 08:46:57 UTC