Internationalization from Tim Bray on 1996-10-19 (w3c-sgml-wg@w3.org from October 1996)

From: Tim Bray <tbray@textuality.com>
Date: Sat, 19 Oct 1996 08:19:57 +0000
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <3.0b33.32.19961018224756.00b131c4@pop.intergate.bc.ca>

The ERB has received a certain amount of grousing from people disappointed
at XML's not using MIME or FSI machinery to specify language encoding.  The
reasons for this are twofold:

1. Non-network access

[This is obvious] a lot of XML files are going to be stored in files
and accessed by means other than HTTP, and some encoding signalling
mechanism is obviously required.  Also, we wanted something that was
simple, easy to explain, and could be done by anyone with whatever
editor they were using.  

2. Distrust

There is, on the ERB, a pervasive lack of trust in the reliability
of Mime and other external/metadata encoding signaling devices.  This
is felt particularly intensely by me, based on my experience in processing
many millions of HTML pages for the Open Text Index, and by Michael
Sperberg-McQueen, based on his years of work in a highly heterogeneous
networked computing environment.  I, and I'm sure many of the rest of us,
have enjoyed the beneficial effects of Mime technologies in recent years;
but if you're doing anything large-scale on the Web, if you rely on
external metadata/signalling in general, you're dead - or more formally,
your receiving application will suffer an unreasonable number of failures
due to various botches and bogosities.  The reasons for this are, in my
experience, a combination of incompetence in website administrators and
lack of education in document authors.  Personally, I think that with
a *combination* of mime, metadata, and careful examination of the first
few bytes of the file to see if any reasonable interpretation of the
bit patterns generates something like '<?XML', we have the tools in
place for XML processors to offer more reliable access to heterogeneous
multilingual distributed data than anything else I'm aware of.

We're just specifying the language.  Although the ERB hasn't voted on this,
it seems self-evident that a good XML processor should make use of whatever
information might be available external to the language [be it a resource
fork or an HTTP header or whatever] to do the right thing.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

Received on Saturday, 19 October 1996 11:23:39 UTC