Determination of Encoding (Re: Invasion of the pseudo-people ...)

Internet experts at Keio University (W3C host) and SGML experts 
in Japan have discussed the encoding detection issue.  Here is 
my (personal) summary of our agreement.

We should use BOM and encoding declarations only.  If a
document entity or an external entity does not have BOM or 
an encoding declaration, it is in UTF-8.  Period.

Other information or huristics such as "Metadata provided 
by the native OS file system or by document management 
software" (4.3.3, Part 1) should not be used.   Encoding 
inheritance should not be introduced.  There should be 
nothing similar to the CHARSET parameter of the element 
type A (HTML-I18N).

If HTTP or MIME headers provide encoding information, it 
should be identical to the encoding specified in the 
transmitted document (possibly implicitly by the XML default, 
which is UTF-8).  If not identical, the system is in error.

Is this agreeable?  I think this is very clear.  This 
is not always very convenient, but nobody or no systems 
will be confused.


Note:  Some of you may think this is very different from 
what I wrote in my mail 
<9706070915.AA00089@lute.apsdc.ksp.fujixerox.co.jp>.  
Actually, I was merely suggesting all possiblilities 
rather than proposing them.  However, I have changed 
my mind in that encoding inheritance should not be 
introduced.

Makoto
 
Fuji Xerox Information Systems
 
Tel: 044-812-7230   Fax: 044-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp

Received on Saturday, 14 June 1997 08:06:08 UTC