Re: Comments on Part 1: Encoding declaration from Rick Jelliffe on 1997-06-03 (w3c-sgml-wg@w3.org from June 1997)

From: Rick Jelliffe <ricko@allette.com.au>
Date: Tue, 3 Jun 1997 18:37:41 +1000
To: "Murata Makoto" <murata@apsdc.ksp.fujixerox.co.jp>, <w3c-sgml-wg@w3.org>
Message-Id: <199706030842.SAA15520@jawa.chilli.net.au>

> From: Murata Makoto <murata@apsdc.ksp.fujixerox.co.jp>

> (4) Proposed changes
> 
> If an external text entity does not begin with a Byte Order
> Mark or an encoding declaration, XML processors may assume 
> that this entity is in the same encoding as the entity 
> that references to it.
> 
> If a document entity does not begin with a Byte Order
> Mark or an encoding declaration, XML processors may assume 
> that this entity is in the UTF-8 encoding.
> 
> XML processors may use other information to detect the
> actual encoding method, but are not required to do so.

I agree. think the first method is a preferable default, and it can get added to the list in the annex.
The important thing is that some definite strategy is in place:

* default is UTF-8 for top-level, or inherited if an external entity reference (!not a link).
* but BOM overrides this: Unicode
* but encoding PI overrides this
* else...any kind of autodetect or user preference list or locale-setting

Received on Tuesday, 3 June 1997 04:41:50 UTC