Re: Accept-Charset support from Martin J. Duerst on 1996-12-11 (www-international@w3.org from October to December 1996)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Wed, 11 Dec 1996 17:40:32 +0100 (MET)
To: Drazen Kacar <Drazen.Kacar@public.srce.hr>
cc: Larry Masinter <masinter@parc.xerox.com>, Chris.Lilley@sophia.inria.fr, www-international@w3.org, Alan_Barrett/DUB/Lotus.LOTUSINT@crd.lotus.com, bobj@netscape.com, wjs@netscape.com, erik@netscape.com, Ed_Batutis/CAM/Lotus@crd.lotus.com
Message-ID: <Pine.SUN.3.95.961211172543.245L-100000@enoshima>

On Fri, 6 Dec 1996, Drazen Kacar wrote:

> Larry Masinter wrote:
> > # That implies that sending
> > # 	Accept-Charset: utf-8
> > # Should generate a 406 response if the document is only available in, say,
> > # Latin-1 and the server cannot convert that to UTF-8.
> > 
> > I think Latin-1 is a special case. From
> > draft-ietf-http-v11-spec-07.txt:
> > 
> > # The ISO-8859-1 character set can be assumed to be acceptable to all
> > # user agents.
> 
> Come on, that was political compromise. ISO 8859-5 terminal can't
> represent iso-8859-1 with q=1.0. User agent can do necessary translations,
> but what actually gets displayed is not the same as on ISO 8859-1
> terminal.

That wasn't a political compromize, it is a historical coincidence.
At some point, if you wanted to join the internet, your computer
had to understand ASCII is some way or another.
The Web, because it started at CERN in Geneva, was ISO-8859-1 from
the beginnig, and so to join the web, you have to understand
ISO-8859-1. The web wasn't really designed for dumb terminals anyway.

This historical coincidence is something I can accept. What is
impossible for me to accept is such <FLAME>brainless stupidities</FLAME>
as specifying that ISO-8859-1 can be used in HTTP 1.1 warnings
in raw form, but anything else has to be encoded along RFC 1522.

RFC 1522 is designed for 7-bit channels. If you have an 8-bit
channel, there is no reason to use it. If you are using RFC 1522
anyway, there is no reason to give special preference to ISO-8859-1.
If you have 8 bits available, there is no reason to fully use them
for ISO-8859-1, removing any extensibility. To everybody in the
i18n business, it is clear that if you are going to use 8-bit,
you better use it for UTF-8.

I see three ways (in rough order of preference)
to get out of this problem:

(1) Specify UTF-8 as the only thing to be used
(2) Specify RFC 1522 for everything outside UTF-8 (which includes ASCII)
(3) Specify RFC 1522 for everything outside ASCII
(4) Specify RFC 1522 for everything outside Latin-1 and UTF-8

Comment to (4): Latin-1 and UTF-8 strictly speaking are not
compatible. However, in practice, and for string lengths that
will typically appear in warnings, they can be distinguished
easily.

The above problem is a very clear example of bad design.
I hope the HTTP 1.1 draft can still be changed. If not,
it would be a very clear reason for raising objections
directly with IETF or whoever is responsible.

Regards,	Martin.

Received on Wednesday, 11 December 1996 11:41:17 UTC