W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > October 1996

Re: some ERB decisions

From: Gavin Nicol <gtn@ebt.com>
Date: Thu, 17 Oct 1996 14:40:06 -0400
Message-Id: <199610171840.OAA13000@nathaniel.ebt>
CC: w3c-sgml-wg@w3.org
>  - the character repertoire of XML documents is that of ISO 10646


>  - conforming XML documents may be in UTF-8 or UCS-2 form


>  - all XML processors must accept documents in UTF-8 and UCS-2 (or
>    optionally UTF-16) form

I'm not a great fan of UTF-16, and am worried about the connotations
of "accept". Does that mean parse, process, or just accept and die?

>  - XML processor may provide a user option which causes them to accept
>    documents in other coded character sets (e.g. ISO 8859 or JIS 0208)
>    or other encodings of 10646 or other coded character sets (e.g.
>    Extended Unix Code) -- this behavior must be optional (i.e. the user
>    must be able to turn it off, so that documents not in UTF-8 or
>    UCS-2 raise errors).

OK. I can live with this, but am not overly happy about the "must be
optional" clause. 

>Still open:  details of the mechanism to be used for signaling the
>encoding and/or coded character set in use.

3 methods:
   1) MIME headers for HTTP/email/filesystem (via *.mim)
   2) FSI attributes
   3) Catalog parameters
Received on Thursday, 17 October 1996 14:41:51 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:04 UTC