W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > June 1997

Determination of Encoding (Re: Invasion of the pseudo-people ...)

From: Murata Makoto <murata@apsdc.ksp.fujixerox.co.jp>
Date: Sat, 14 Jun 1997 21:07:25 +0900
Message-Id: <9706141207.AA00265@lute.apsdc.ksp.fujixerox.co.jp>
To: w3c-sgml-wg@w3.org
Internet experts at Keio University (W3C host) and SGML experts 
in Japan have discussed the encoding detection issue.  Here is 
my (personal) summary of our agreement.

We should use BOM and encoding declarations only.  If a
document entity or an external entity does not have BOM or 
an encoding declaration, it is in UTF-8.  Period.

Other information or huristics such as "Metadata provided 
by the native OS file system or by document management 
software" (4.3.3, Part 1) should not be used.   Encoding 
inheritance should not be introduced.  There should be 
nothing similar to the CHARSET parameter of the element 
type A (HTML-I18N).

If HTTP or MIME headers provide encoding information, it 
should be identical to the encoding specified in the 
transmitted document (possibly implicitly by the XML default, 
which is UTF-8).  If not identical, the system is in error.

Is this agreeable?  I think this is very clear.  This 
is not always very convenient, but nobody or no systems 
will be confused.

Note:  Some of you may think this is very different from 
what I wrote in my mail 
Actually, I was merely suggesting all possiblilities 
rather than proposing them.  However, I have changed 
my mind in that encoding inheritance should not be 

Fuji Xerox Information Systems
Tel: 044-812-7230   Fax: 044-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp
Received on Saturday, 14 June 1997 08:06:08 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:10 UTC