Re: proposed HTTP changes for charset from Francois Yergeau on 1996-07-08 (ietf-http-wg@w3.org from July to September 1996)

From: Francois Yergeau <yergeau@alis.ca>
Date: Sun, 7 Jul 1996 21:32:42 -0500
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199607080135.VAA06201@genstar.alis.ca>
> Date:          Fri, 05 Jul 1996 17:25:38 -0700
> From:          "Roy T. Fielding" <fielding@liege.ICS.UCI.EDU>
> 
> I have already covered these questions ad-nauseum.
> 
>   1) HTTP has *always* used a default charset value of ISO-8859-1.

This is wrong.  Repeating a falsehood ad nauseam doesn't make it any 
truer.  The default has stopped being ISO 8859-1 ever since the very 
first non-Latin-1 document was transmitted - unlabelled.

Servers have never sent charset, and thus have always "defaulted" to 
whatever was in the document being sent,  Latin-1 or not.

Clients have always defaulted to whatever was the (generally) unique 
encoding they could deal with, Latin-1 or not.

Proxies have never paid any attention, and have never defaulted to 
anything.

>      All implementations to the contrary had KNOWN failure conditions
>      and did not work as intended except within locally controlled
>      environments.

All implementations that assume Latin-1 have KNOWN failure conditions 
and do not work as intended when presented with unlabelled 
non-Latin-1 documents, that is outside of locally controlled Latin-1 
environments.  ISO 8859-1 is just a locally useful charset, whose 
relative importance is shrinking daily.

>   2) The HTTP version defines the communication capability of the
>      immediately adjacent client or server -- it NEVER indicates that
>      feature capabilities of the user agent.

Who said anything else?  The version number indicates what protocol 
features can be used.  A server receiving 1.0 knows it can reply with 
MIME-like headers followed by a blank line and an entity, which it 
cannot do with with 0.9.

My point was that a client sending 1.1 should guarantee that it can 
deal gracefully with a charset parameter (behave no worse than 
without charset), something that is not true today with 1.0, despite 
the language in RFC 1945.

>   3) None of the issues you have raised involve a technical problem
>      with the HTTP/1.1 protocol -- they are POLITICAL problems that
>      are an artifact of historical reality, a reality which the IETF
>      is not capable of changing.

By attempting to enforce a default charset with no justification in 
current practice or technical, the IETF would be making a political 
statement: the western hemisphere wants a free ride, let the rest of 
the world deal with the charset labelling problem.  You are right 
that Latin-1 as a default is an historical artefact; it *was* the 
default when it was the only encoding, but has not been ever since.

>   4) Labelling the charset with its real value if it is different than
>      iso-8859-1 *always* works, both in old an new practice,...

You're seeing only one side of the coin.  The world does not revolve 
around ISO 8859-1.  Today, lots of people *need* to set their browser 
to a default other than Latin-1, because most of the stuff they read 
is in languages not representable in Latin-1 and is unlabelled.  When 
they get a document - unlabelled - in Latin-1 (or anything else but 
their current default), they see garbage.  No, it doesn't always 
work.

>   5) Whether or not a client is capable of understanding the charset
>      parameter is NOT a function of the protocol version -- ALL HTTP/1.0
>      clients MUST understand charset, even if HTTP/1.0 is not a "standard",
>      because that is part of the HTTP/1.0 definition (see RFC 1945).

See above.  True in theory, but not (yet) in practice.  Hence a 
server cannot count on the 1.0 label to mean that the client will 
grok charset; I know, I've tried, got burned and had to hack 
something based on User-Agent:.

> I see no point in continuing this discussion unless you can demonstrate
> a real problem that needs to be solved and can be solved within the
> constraints of HTTP/1.1.

I have described the problem above.  It lies outside of a 
Latin-1-centric world, so perhaps you have not encountered it before, 
but it nevertheless exists.  It really makes interoperability a 
pipedream in places where Latin-1 is irrelevant, and even more where 
multiple encodings are used for one language.

As for the constraints of HTTP/1.1, there has been a lot of loose 
talk about backward compatibility, but no definite problem case 
cited, except for a potential one with proxies.  The latter was, I 
think, fixed if only origin servers are required to send charset.  If 
that is not the case, please say so.  I'd rather hear this than 
"Let's not discuss it".

Finally, why is HTTP/1.1 constrained to mandate English and ISO 
8859-1 as defaults in the new Warning: header?  I have not seen any 
justification, so let's please fix that.  Has anyone noticed that 
mandating a default language amounts to telling those who do not 
speak that language that they cannot write complete 1.1 servers, lest 
they hire a translator?

Regards,
-- 
Francois Yergeau <yergeau@alis.com>
Alis Technologies Inc., Montreal
Tel : +1 (514) 747-2547
Fax : +1 (514) 747-2561
Received on Sunday, 7 July 1996 18:41:28 UTC