Re: Internationalization

At 4:19 AM 10/19/96, Tim Bray wrote:
>1. Non-network access
>[This is obvious] a lot of XML files are going to be stored in files
>and accessed by means other than HTTP, and some encoding signalling
>mechanism is obviously required.  Also, we wanted something that was
>simple, easy to explain, and could be done by anyone with whatever
>editor they were using.  ]

MIME headers would meet this requirements, if they are allowed in the file.
See my earlier posting for an elaboration of this proposal.

>2. Distrust
>There is, on the ERB, a pervasive lack of trust in the reliability
>of Mime and other external/metadata encoding signaling devices.  This
>is felt particularly intensely by me, based on my experience in processing
>many millions of HTML pages for the Open Text Index, and by Michael
>Sperberg-McQueen, based on his years of work in a highly heterogeneous
>networked computing environment.  I, and I'm sure many of the rest of us,
>have enjoyed the beneficial effects of Mime technologies in recent years;
>but if you're doing anything large-scale on the Web, if you rely on
>external metadata/signalling in general, you're dead - or more formally,
>your receiving application will suffer an unreasonable number of failures
>due to various botches and bogosities.  The reasons for this are, in my
>experience, a combination of incompetence in website administrators and
>lack of education in document authors.  Personally, I think that with
>a *combination* of mime, metadata, and careful examination of the first
>few bytes of the file to see if any reasonable interpretation of the
>bit patterns generates something like '<?XML', we have the tools in
>place for XML processors to offer more reliable access to heterogeneous
>multilingual distributed data than anything else I'm aware of.

We can make the metadata in the document supersede the data in the HTTP
header, if that will improve reliability. Then people will stop using the
internal headers for HTTP delivery, once they are no longer needed to make
things work because of improved HTTP implementation and practice.

>We're just specifying the language.  Although the ERB hasn't voted on this,
>it seems self-evident that a good XML processor should make use of whatever
>information might be available external to the language [be it a resource
>fork or an HTTP header or whatever] to do the right thing.

I no longer trust in self-evidence. Let's make a (simple) rule. Good sense
and character set issues never seem to go together. I'm sure people will
implement naming conventions for XML file suffixes and the like, and we
need not worry about that, but we're talking about data that is _essential_
for the document to be processed. That cannot be left to the implementers
ingenuity. (They'll be far to ingenious for their own good!)

Neither of these arguments will fly in the Web technical community, unless
they can see that we are not tromping on the (sensible) infrastructure that
is already defined. A brand new character set syntax and
self-identification technique tromps, in my opinion.

>Cheers, Tim Bray
>tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________