Re: Accept-Charset support from Martin J. Duerst on 1996-12-05 (www-international@w3.org from October to December 1996)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Thu, 5 Dec 1996 12:05:16 +0100 (MET)
To: Drazen Kacar <Drazen.Kacar@public.srce.hr>
cc: erik@netscape.com, Alan_Barrett/DUB/Lotus.LOTUSINT@crd.lotus.com, www-international@w3.org, bobj@netscape.com, wjs@netscape.com, Chris.Lilley@sophia.inria.fr, Ed_Batutis/CAM/Lotus@crd.lotus.com
Message-ID: <Pine.SUN.3.95.961205113743.279B-100000@enoshima>

On Thu, 5 Dec 1996, Drazen Kacar wrote:

> Erik van der Poel wrote:
> 
> > > (1) If the user, though the UI, says they want to "Request Multi-Lingual
> > > Documents" then the browser should send:-
> > 
> > I don't think we should have UI for the Accept-Charset. Think about
> > novice users. Will they understand it?
> 
> Yes. Perhaps people who live in Latin 1 world won't, but everything works
> for them anyway. I live in Latin 2 world and I have reasonable technical
> background, so I can hardly be called a novice user. But I can tell you
> how it looks to novice users.

[Drazen Kacar describing various not very user friendly situations.]

I know lots of such situations, and I guess most on this list are
familliar with these situations. These are examples of broken software,
and it is usually easy to see how they can and should be fixed.

To decide whether an UI for Accept-Charset is needed, we mainly have
to examine software that does the right thing. Adding an UI to broken
software is not usually the right solution.

What I would immagine a good browser or similar software to do is
to examine the available resources, in particular the fonts and
maybe translation tables. On X11, this would mean doing something
like xlsfonts, checking the encoding part of the long font names,
and concluding from that that e.g. a font for iso-8859-2 is available,
or for a more sophisticated browser, whether the display resouces
for iso-8859-2 can be patched together somehow. If a browser finds
such a font, it can safely add iso-8859-2 to its "Accept-Charset"
list.

> My native language needs 5 letters from Latin 2 code page, the rest is in
> US-ASCII. Latin 2 on Unix means ISO 8859-2 and Unix host were connected
> before anything else.

Just a small comment: Latin 2 is a synonym of ISO 8859-2, and should
not mean anything else on any platform.

> ISO 8859-2 won on Usenet.
> 
> On the web CP1250 won the majority of pages. Largely because there were no
> authoring tools which knew about difference between ISO 8859-2 and CP1250.

Very unfortunately, the good example set at the beginning of the Web
with unifying on iso-8859-1 for Western Europe has not been taken
on for the rest of the world.

> Now, why is accept-charset needed?

There is of course much need for accept-charset, as opposed to
a UI for it, where I don't see the need, because the software
should be able to decide what it can accept by its own.

However, what the above example shows is that it is important to
get to a similar (or hopefully even better) situation with respect
to "charset"s in other parts of the world as we have it for Western
Europe. The main point is that we try to keep the number of encodings
used really, really low.

At the Sevilla workshop, I was rather depressed than impressed
when Eric showed the list of "charset"s that Netscape is currently
(or will be) accepting. Sorry I didn't speak up at that moment.
The list was long, and it contained lots of what I would call
garbage. I cannot blame Netscape for trying to accept whatever
servers seem to send. But in any case, please show this in a
2-column layout:
First column: Recommended and widely used encodings (e.g.
	iso-8859-1)
Second column: We also accept, but authors or tools should not create
	this, and servers should not send this (e.g. CP1252).

Regards,	Martin.

Received on Thursday, 5 December 1996 06:08:30 UTC