W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > September 1996

Re: XML character sets: the hard-minimalist manifesto

From: Gavin Nicol <gtn@ebt.com>
Date: Mon, 16 Sep 1996 12:36:45 GMT
Message-Id: <199609161236.MAA11092@wiley.EBT.COM>
To: sjd@ebt-inc.ebt.com
CC: w3c-sgml-wg@w3.org
I have no problem with a canonical form, so long as it doesn't stop
people using other forms.

>a) Once you've parsed an XML document once, all further parses must produce
>an absolutely byte-identical copy of the XML document that came back from
>that first parse. I believe the right term for this is "idempotence". This
>makes parser conformance testing trivial.

This would necessarily take 2 passes for validation:

Pass 1:
  Resolve entity references etc. convert to UTF-8, generate document.

Pass 2: 
  Parse document, generate document, compare results.

Though things like empty elements and RE/RS complicate this no end.
Received on Monday, 16 September 1996 08:38:24 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:03:21 EDT