- From: Francois Yergeau <yergeau@alis.ca>
- Date: Wed, 3 Jul 1996 10:08:03 -0500
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
> From: Larry Masinter <masinter@parc.xerox.com> > Date: Tue, 2 Jul 1996 18:38:02 PDT > I suggest making the following change, which is less controversial > than the "charset=unknown" proposal: > > Current HTTP/1.1 spec: > ... > > < The "charset" parameter is used with some media types to define the > < character set (section 3.4) of the data. Origin servers SHOULD > < include an appropriate charset parameter for those media types which > < allow one (including text/html and text/plain) to avoid ambiguity. > < In the absence of a charset parameter, the default charset value MAY > < be assumed to be "ISO-8859-1" when received from a HTTP/1.1 server. Not good enough, I'm afraid. For one, charset can still be ignored, and the problem we have now (its absence in most cases) will not be solved. Further, ISO-8859-1 is still in, with no justification whatsoever. If there is to be a default, it should be UTF-8, not a "local derivative" like Latin-1. There is a problem with charset=x-unknown, but this was proposed by Keith only for 1.1 proxies who would have to label unlabelled content received from a 1.0 server. The language above (with SHOULD appropriately replaced by MUST) would require only origin servers to label, so the problem disappears. Proxies receiving unlabelled content can just leave it alone, but we may go as far as permitting them ("MAY") to label it if they happen to know the charset. The same ISO-8859-1 is also present in section 14.45 about the Warning header. The second paragraph after the BNF ends with: The default language is English and the default character set is ISO-8599-1. If a character set other than ISO-8599-1 is used, it MUST be encoded in the warn-text using the method described in RFC 1522 [14]. This should be replaced with: The default character encoding is the UTF-8 encoding of ISO-10646. If a character encoding other than UTF-8 is used, it MUST be encoded in the warn-text using the method described in RFC 1522 [14]. Please note that ASCII text qualifies as UTF-8, but not ISO-8859-1. -- Francois Yergeau <yergeau@alis.com> Alis Technologies Inc., Montreal Tel : +1 (514) 747-2547 Fax : +1 (514) 747-2561
Received on Wednesday, 3 July 1996 07:19:08 UTC