Re: proposed HTTP changes for charset from Francois Yergeau on 1996-07-03 (ietf-http-wg@w3.org from July to September 1996)

From: Francois Yergeau <yergeau@alis.ca>
Date: Wed, 3 Jul 1996 10:08:03 -0500
To: Larry Masinter <masinter@parc.xerox.com>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199607031410.KAA25307@genstar.alis.ca>

> From:          Larry Masinter <masinter@parc.xerox.com>
> Date:          Tue, 2 Jul 1996 18:38:02 PDT

> I suggest making the following change, which is less controversial
> than the "charset=unknown" proposal:
> 
> Current HTTP/1.1 spec:
> ...
>
> < The "charset" parameter is used with some media types to define the
> < character set (section 3.4) of the data. Origin servers SHOULD
> < include an appropriate charset parameter for those media types which
> < allow one (including text/html and text/plain) to avoid ambiguity.
> < In the absence of a charset parameter, the default charset value MAY
> < be assumed to be "ISO-8859-1" when received from a HTTP/1.1 server.

Not good enough, I'm afraid.  For one, charset can still be ignored, 
and the problem we have now (its absence in most cases) will not be 
solved.  Further, ISO-8859-1 is still in, with no justification 
whatsoever.  If there is to be a default, it should be UTF-8, not a
"local derivative" like Latin-1.

There is a problem with charset=x-unknown, but this was proposed by 
Keith only for 1.1 proxies who would have to label unlabelled content 
received from a 1.0 server. The language above (with SHOULD 
appropriately replaced by MUST) would require only origin servers to 
label, so the problem disappears. Proxies receiving unlabelled 
content can just leave it alone, but we may go as far as permitting 
them ("MAY") to label it if they happen to know the charset.


The same ISO-8859-1 is also present in section 14.45 about the 
Warning header.  The second paragraph after the BNF ends with:

 The default language is
 English and the default character set is ISO-8599-1.
 If a character set other than ISO-8599-1 is used, it MUST be encoded
 in the warn-text using the method described in RFC 1522 [14].

This should be replaced with:

 The default character encoding is the UTF-8 encoding of ISO-10646.
 If a character encoding other than UTF-8 is used, it MUST be encoded
 in the warn-text using the method described in RFC 1522 [14].

Please note that ASCII text qualifies as UTF-8, but not ISO-8859-1.
-- 
Francois Yergeau <yergeau@alis.com>
Alis Technologies Inc., Montreal
Tel : +1 (514) 747-2547
Fax : +1 (514) 747-2561

Received on Wednesday, 3 July 1996 07:19:08 UTC