- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Fri, 13 Dec 1996 22:13:33 +0100 (MET)
- To: Klaus Weide <kweide@tezcat.com>
- cc: Francois Yergeau <yergeau@alis.com>, www-international@w3.org
On Fri, 13 Dec 1996, Klaus Weide wrote: > It is my understanding that the folks who would be using 8859-2 haven't > yet agreed on whether to use that or windows-1250 or cp852 or ..., > and those who might use 8859-5 are also split (KOI8-R, "alt", ...). > There are several encodings in use for Japanese. That Russian DOS box > either already has learnt to speak several charsets, or it will not even > be able to understand its neighbors in the same region. Yes, exactly. They didn't catch the good side of agreeing on ISO-8859-1 for western Europe for HTML. The Japanese can be excused a little bit, as in their case, the software usually converts automatically, and they just expected this to happen also for a web browser (and implemented it that way). Anyway, I guess the only solution for Eastern Europe, Russia, and so on will be Unicode. > Even if it seems right now that nearly everybody around the world does > pretty much what they want (let the client guess what we are sending, > after all it works most of the time or we just don't know any better) > --- there is a history in the drafts and specs of Web protocols that > said "iso-8859-1 is default". One would think the world joined the > World-Wide Web under those conditions... The others joined the WWW with one big wish: "It's so cool, we want it too." Everything else was second to this. They didn't mind the iso-8859-1 default, because they worked around it. They knew it wouldn't work in all cases, but that was not of much importance. One important thing you can learn here is: The more you think that what you do might have a chance to become really cool, the more you should care about serious i18n. > >From some responses it seemed the NC in everybody's hands is right > around the corner, which would then make all-you-can-eat of fonts > appear on the screen via some Java magic (presumably with negligible > cost and delay)... but I rather like your definition of "supporting > UTF-8". There's nothing wrong with displaying _U1234_ if necessary, > I suppose. Not for HTML, anyway. We took care of that. > I should clarify that above I am referring to charsets for entity bodies. > The part of the HTTP draft about charsets in Warning headers seemed, > uhmm, antiquated when I first read it (some months ago). I can agree > that iso-8859-1 in a special role doesn't seem to belong there. "antiquated" is a very good word. It hasn't changed since, unfortunately. I am glad you write this; in private mail somebody has told me that I would have nobody to agree that iso-8859-1 in warnings does not need any special place :-). Anyway, with regards to warnings, I have a little proposal: Let's collect a list of those six warings in the draft, in many languages, e.g. in the form: en.10 Response is stale de.14 Umwandlung angewentet and so on. An inital file with English and German (not yet really perfect) is available as: ftp://www.ifi.unizh.ch/pub/multilingual/http.warnings.utf8 Any improvements or additions are highly wellcome, just send them to me and I'll integrate them. To start, you can get the file with English only (still in utf8, but also in ASCII) as: ftp://www.ifi.unizh.ch/pub/multilingual/http.warnings.ascii It's only six short messages at the moment, so translation is done very quickly. For submission, you don't need to use utf8, I can integrate quite a few things. And of course I have an editor that can handle UTF-8 :-). Then let's make this file (and a little bit of code to extract the desired warning) available to implementors, and let's ask them to just send the strings out as is, and just silently ignore the antiquated ISO-8859-1 default for warnings, and silently change that to UTF-8. Interestingly enough, with this solution, the server side does not have to worry AT ALL about what encoding the warnings are in, how to convert that encoding to something allowed, how to implement RFC1522, or whatever. I guess implementors will love this. Those that don't care wont do anything else than English, anyway. Lets make the internet principles of implementation priority and independet creativity work for decent and non-antiquated internationalization. Regards, Martin.
Received on Friday, 13 December 1996 16:13:31 UTC