W3C home > Mailing lists > Public > www-validator@w3.org > July 2001

Re: charset parameter

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 27 Jul 2001 00:05:54 +0200
To: Terje Bless <link@pobox.com>
Cc: W3C Validator <www-validator@w3.org>
Message-ID: <aj21mtk2ckibso67m5s784dvo9uej0vqr2@4ax.com>
* Terje Bless wrote:
>The HTML Recommendation has no authority to dictate syntax or semantics for
>an arbitrary transport protocol.

Well, it doesn't. It defines behaivour for applications that retrieve
specific content over a specific transport protocol. 

>I'm guessing that the _intent_ was that something labelled "ISO-8859-1"
>should be parsed accordingly, until a meta element with, say,
>"windows-1250" was encountered, and then _restarted_ with the new encoding
>in effect (implicit in this is that it should be compatible with the
>transport encoding up to the meta element).

No, the intent was, that _servers_ parse the HTML document and send the
correct Content-Type: header, HTML 4 even says so. In fact, this
definition of the http-equiv attribute is incompatible with section
5.2.2. The definition implies, that all meta elements mit an http-equiv
attribute must be ignored if the document was transfered via HTTP. If it
wasn't transfered via HTTP, the attribute is useless, since it makes use
of HTTP semantics that don't apply to that other protocol. What about
Content-Type and the character encoding? No use for that either. The
user agent must then have some encoding information to convert abitrary
bytes to characters. If the protocol comes with that information: great,
if not, the document could not be parsed and if you can't parse the
document, you'll never see some meta element. [1]

>This obviously does not consider HTTP defaulting behaviour, but even
>[RFC 2854] still says that ISO-8859-1 is the default.

It says what HTTP/1.1 says, it doesn't define any default value for the
charset parameter but it does point at section 5.2 of HTML 4.

[1] I think the http-equiv attribute is the worst thing ever
    incorporated into HTML. It hasn't been implemented, it beeing
    abused, semantics aren't clearly defined, the definition is
    ambigious, only a small number of people put syntactically valid
    information in the content attribute for some HTTP headers, etc.pp.
    I'll find some evil hellcat to put even more evil spells on the HTML
    WG members if this attribute won't be kicked out of XHTML 2.0 (or
    replaced by something with value) };-)
-- 
Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/
Received on Thursday, 26 July 2001 18:06:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:13:59 GMT