charset bugs (was: proposed HTTP changes for charset) from Paul Leach on 1996-07-08 (ietf-http-wg@w3.org from July to September 1996)

From: Paul Leach <paulle@microsoft.com>
Date: Mon, 8 Jul 1996 12:17:03 -0700
To: "'Roy T. Fielding'" <fielding@liege.ICS.UCI.EDU>, 'Benjamin Franz' <snowhare@netimages.com>
Cc: 'Francois Yergeau' <yergeau@alis.ca>, "'http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com'" <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>, "'http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com'" <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>
Message-Id: <c=US%a=_%p=msft%l=RED-77-MSG-960708191703Z-11088@tide21.microsoft.com>

(Not intended for HTTP/1.1 spec consumption since it's closed)

No charset labelling scheme will work if the browsers have bugs that
choke them when it's used. The solution is to fix the bugs. The
existence of bugs does not disprove Roy's statements about old practice
-- the HTTP/1.0 spec, while only "de-facto" and "post-facto", does
constitute a reasonable way of deciding whether or not an HTTP/1.0
application has a bug in it by virtue of it departing far enough from
community norms.

I've reported the bug discussed below and the issue with ISO-8859-2 and
code pages to our dev team. Problems with other companies' products
should be brought to their attention.

If we get to the point where all browsers understand charset labels when
present, then we can argue whether we have to go farther -- but I can't
think of anything more that could be added to the 1.1 spec itself that
will make bugs any less likely. And, in fact, making the spec more
stringent than what the consumers really want will only lead to
deliberate violations.

>----------
>From: 	Benjamin Franz[SMTP:snowhare@netimages.com]
>Sent: 	Monday, July 08, 1996 7:09 AM
>To: 	Roy T. Fielding
>Cc: 	Francois Yergeau; http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com;
>http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
>Subject: 	Re: proposed HTTP changes for charset 
>
>On Fri, 5 Jul 1996, Roy T. Fielding wrote:
>
>> I have already covered these questions ad-nauseum.
>>   4) Labelling the charset with its real value if it is different than
>>      iso-8859-1 *always* works, both in old an new practice, because
>>      any user agent incapable of handling a charset value is also
>>      incapable of handling a charset other than iso-8859-1.  The only
>>      time problems occur is when iso-8859-1 data is labelled as such
>>      and then delivered to an older client.
>
>This isn't true. I was recently writing a chat CGI program and tried
>labelling something as ISO-2022-JP. It caused the otherwise Japanese 
>display capable browser client (MSIE 3.0b1) to choke. It refused to
>display the charset labelled file, instead attempting to download to a
>file. If I *didn't* label it I was fine. The issue of charset labelling
>breaking browsers cannot be neatly shoved off that way. It breaks
>non-latin1 1.0 browsers just as badly as latin1  1.0 browsers. If it is
>unacceptable to mandate charset labelling becasue it breaks latin1
>browsers - it is equally unacceptable to break non-latin 1 browsers.
>
>I am trying to figure out why charset being a MUST for 1.1 is even an
>issue at all.  Let's see if I have this right.
>
>Case 1) A client *issues* a 1.1 request to a 1.0 server.
>
>The server chokes on the 1.1 level and returns a 400 error.  Ok. No
>problem the client can now try to back off to 1.0 - which won't be
>labelled with a charset most likely. 
>
>Case 2) A client issues a 1.1 request to a 1.1 server.
>
>It gets a charset *always*. No problem - since there *ARE* no HTTP/1.1
>browsers in existence today there is no compatiblity issue.
>
>Case 3) A client issues a 1.0 request to a 1.1 server
>
>Server serves up as a 1.0 server without charset labelling (same as
>today's servers). No problem.
>
>Case 4) A client issues a 1.1 request to a 1.1 server.
>
>It gets a 1.1 responese including charset *always*. No problem.
>Since there are no 1.1 browsers today, you can mandate charset
>safely as far as browsers are concerned.
>
>Ok. All of these cases work ok. So the problem has got to be when you
>stick a proxy in the line. How does mandating charsets break proxies?
>I don't see it.
>
>-- 
>Benjamin Franz
>
>

Received on Monday, 8 July 1996 12:23:00 UTC