- From: Francois Yergeau <yergeau@alis.ca>
- Date: Wed, 3 Jul 1996 10:08:03 -0500
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
> From: Larry Masinter <masinter@parc.xerox.com>
> Date: Tue, 2 Jul 1996 18:38:02 PDT
> I suggest making the following change, which is less controversial
> than the "charset=unknown" proposal:
>
> Current HTTP/1.1 spec:
> ...
>
> < The "charset" parameter is used with some media types to define the
> < character set (section 3.4) of the data. Origin servers SHOULD
> < include an appropriate charset parameter for those media types which
> < allow one (including text/html and text/plain) to avoid ambiguity.
> < In the absence of a charset parameter, the default charset value MAY
> < be assumed to be "ISO-8859-1" when received from a HTTP/1.1 server.
Not good enough, I'm afraid. For one, charset can still be ignored,
and the problem we have now (its absence in most cases) will not be
solved. Further, ISO-8859-1 is still in, with no justification
whatsoever. If there is to be a default, it should be UTF-8, not a
"local derivative" like Latin-1.
There is a problem with charset=x-unknown, but this was proposed by
Keith only for 1.1 proxies who would have to label unlabelled content
received from a 1.0 server. The language above (with SHOULD
appropriately replaced by MUST) would require only origin servers to
label, so the problem disappears. Proxies receiving unlabelled
content can just leave it alone, but we may go as far as permitting
them ("MAY") to label it if they happen to know the charset.
The same ISO-8859-1 is also present in section 14.45 about the
Warning header. The second paragraph after the BNF ends with:
The default language is
English and the default character set is ISO-8599-1.
If a character set other than ISO-8599-1 is used, it MUST be encoded
in the warn-text using the method described in RFC 1522 [14].
This should be replaced with:
The default character encoding is the UTF-8 encoding of ISO-10646.
If a character encoding other than UTF-8 is used, it MUST be encoded
in the warn-text using the method described in RFC 1522 [14].
Please note that ASCII text qualifies as UTF-8, but not ISO-8859-1.
--
Francois Yergeau <yergeau@alis.com>
Alis Technologies Inc., Montreal
Tel : +1 (514) 747-2547
Fax : +1 (514) 747-2561
Received on Wednesday, 3 July 1996 07:19:08 UTC