W3C home > Mailing lists > Public > www-validator@w3.org > July 2001

Re: charset parameter

From: Martin Duerst <duerst@w3.org>
Date: Fri, 27 Jul 2001 11:30:20 +0900
Message-Id: <4.2.0.58.J.20010727112249.009d9a90@sh.w3.mag.keio.ac.jp>
To: Terje Bless <link@pobox.com>, W3C Validator <www-validator@w3.org>
At 05:47 01/07/26 +0200, Terje Bless wrote:

>No they don't! The transport is not dictated by the content; you can send
>HTML over FTP, HTTP, SMTP, NNTP, IMAP, MAPI, etc.; and all those transports
>can and do transport much much more then just HTML.
>
>When we get to a "Conforming Application", we no longer know whether
>"ISO-8859-1" was explicit or implied (by the HTTP 1.1 defaulting rules); we
>just know that after all the rules of the Transport have been applied, the
>result was "ISO-889-1" (for HTTP 1.1), "US-ASCII" (NNTP), "UTF-8" (for NNTP
>"updated" by USEFOR), "EUC-JP" (for FTP combined with locale info), "KOI-8"
>(by local policy), "Windows-1252" (MAPI), etc.

Hello Terje,

This is assuming that the protocol information (whether explicit
or implicit) is passed to the application (browser). This is not
in accordance with widespread current practice. The current practice
is that the header information is passed as is, i.e. the application
has to deal with the defaults. And for the iso-8859-1 'default' in
HTTP, all applications I know ignore it in preference of a browser-
set general default (changeable by the user). [In some parts of
the world (yours), those two are the same, but in many others,
they are not.]

By the way, it looks like you are assuming that the encoding of
FTP files is determined by the locale. That's true in a very narrow
sense: for text mode and the distinction between ASCII and EBCDIC.
For the rest, FTP just transmits bytes.


>We also don't know whether the Content-Transport-Encoding was "8bit",
>"Base64", "QP", or "7bit"; because by the time the "Conforming Application"
>gets the data it's been turned into "8bit" by the MIME rules.

Yes, this is true. But this is one level lower than 'charset',
less intertwinned with the application.

Regards,   Martin.
Received on Friday, 27 July 2001 22:55:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:13:59 GMT