Comments on Part 1: Encoding declaration

Comments on Part 1: Encoding declaration

The current draft requires that all text entities have
encoding declarations unless they are encoded in UTF-8.

(1) Internal entities

Does this requiement apply to internal entities?  Is it
possible to apply different encodings to internal entities?
The current draft is not very clear about this.

(2) Duplication of encoding declarations.

Suppose that we have a document comprising one document
entity and one hundred external text entities.  I believe
that these external text entities should not be required
to duplicate the same encoding declaration.

The encoding specified for the root entity or an external
text entity should be inherited by directly-referenced
external entities, unless they have encoding declarations or
they begin with a Byte Order Mark.

(3) Autodetection

Encoding information might be given by Internet protocols
(http, SMTP, etc.).  

In Japan, there are many programs that automatically detect
EUC_JP, Shift_JIS, and ISO_2022_JP.  Some future
implementations of XML will perform even more educated
guess, as XML documents have a number of <!, >, </, and />.

(4) Proposed changes

If an external text entity does not begin with a Byte Order
Mark or an encoding declaration, XML processors may assume 
that this entity is in the same encoding as the entity 
that references to it.

If a document entity does not begin with a Byte Order
Mark or an encoding declaration, XML processors may assume 
that this entity is in the UTF-8 encoding.

XML processors may use other information to detect the
actual encoding method, but are not required to do so.


Makoto
 
Fuji Xerox Information Systems
 
Tel: 044-812-7230   Fax: 044-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp

Received on Wednesday, 28 May 1997 22:55:57 UTC