Re: some ERB decisions from Gavin Nicol on 1996-10-17 (w3c-sgml-wg@w3.org from October 1996)

From: Gavin Nicol <gtn@ebt.com>
Date: Thu, 17 Oct 1996 14:40:06 -0400
To: U35395@UICVM.CC.UIC.EDU
CC: w3c-sgml-wg@w3.org
Message-Id: <199610171840.OAA13000@nathaniel.ebt>

>  - the character repertoire of XML documents is that of ISO 10646

Good.

>  - conforming XML documents may be in UTF-8 or UCS-2 form

Good.

>  - all XML processors must accept documents in UTF-8 and UCS-2 (or
>    optionally UTF-16) form

I'm not a great fan of UTF-16, and am worried about the connotations
of "accept". Does that mean parse, process, or just accept and die?

>  - XML processor may provide a user option which causes them to accept
>    documents in other coded character sets (e.g. ISO 8859 or JIS 0208)
>    or other encodings of 10646 or other coded character sets (e.g.
>    Extended Unix Code) -- this behavior must be optional (i.e. the user
>    must be able to turn it off, so that documents not in UTF-8 or
>    UCS-2 raise errors).

OK. I can live with this, but am not overly happy about the "must be
optional" clause. 

>Still open:  details of the mechanism to be used for signaling the
>encoding and/or coded character set in use.

3 methods:
   1) MIME headers for HTTP/email/filesystem (via *.mim)
   2) FSI attributes
   3) Catalog parameters

Received on Thursday, 17 October 1996 14:41:51 UTC