[Prev][Next][Index][Thread]

Re: some ERB decisions



>  - the character repertoire of XML documents is that of ISO 10646

Good.

>  - conforming XML documents may be in UTF-8 or UCS-2 form

Good.

>  - all XML processors must accept documents in UTF-8 and UCS-2 (or
>    optionally UTF-16) form

I'm not a great fan of UTF-16, and am worried about the connotations
of "accept". Does that mean parse, process, or just accept and die?

>  - XML processor may provide a user option which causes them to accept
>    documents in other coded character sets (e.g. ISO 8859 or JIS 0208)
>    or other encodings of 10646 or other coded character sets (e.g.
>    Extended Unix Code) -- this behavior must be optional (i.e. the user
>    must be able to turn it off, so that documents not in UTF-8 or
>    UCS-2 raise errors).

OK. I can live with this, but am not overly happy about the "must be
optional" clause. 

>Still open:  details of the mechanism to be used for signaling the
>encoding and/or coded character set in use.

3 methods:
   1) MIME headers for HTTP/email/filesystem (via *.mim)
   2) FSI attributes
   3) Catalog parameters


References: