[Prev][Next][Index][Thread]

Re: XML character sets: the hard-minimalist manifesto



I have no problem with a canonical form, so long as it doesn't stop
people using other forms.

>a) Once you've parsed an XML document once, all further parses must produce
>an absolutely byte-identical copy of the XML document that came back from
>that first parse. I believe the right term for this is "idempotence". This
>makes parser conformance testing trivial.

This would necessarily take 2 passes for validation:

Pass 1:
  Resolve entity references etc. convert to UTF-8, generate document.

Pass 2: 
  Parse document, generate document, compare results.

Though things like empty elements and RE/RS complicate this no end.


References: