Re: XML character sets: the hard-minimalist manifesto
I have no problem with a canonical form, so long as it doesn't stop
people using other forms.
>a) Once you've parsed an XML document once, all further parses must produce
>an absolutely byte-identical copy of the XML document that came back from
>that first parse. I believe the right term for this is "idempotence". This
>makes parser conformance testing trivial.
This would necessarily take 2 passes for validation:
Resolve entity references etc. convert to UTF-8, generate document.
Parse document, generate document, compare results.
Though things like empty elements and RE/RS complicate this no end.