Re: charset parameter

From: Martin Duerst (duerst@w3.org)
Date: Thu, Jul 26 2001

  • Next message: Martin Duerst: "Re: charset parameter"

    Message-Id: <4.2.0.58.J.20010727114411.05b24a10@sh.w3.mag.keio.ac.jp>
    Date: Fri, 27 Jul 2001 11:49:52 +0900
    To: Bjoern Hoehrmann <derhoermi@gmx.net>, Terje Bless <link@pobox.com>
    From: Martin Duerst <duerst@w3.org>
    Cc: W3C Validator <www-validator@w3.org>
    Subject: Re: charset parameter
    
    At 00:05 01/07/27 +0200, Bjoern Hoehrmann wrote:
    >* Terje Bless wrote:
    > >The HTML Recommendation has no authority to dictate syntax or semantics for
    > >an arbitrary transport protocol.
    >
    >Well, it doesn't. It defines behaivour for applications that retrieve
    >specific content over a specific transport protocol.
    >
    > >I'm guessing that the _intent_ was that something labelled "ISO-8859-1"
    > >should be parsed accordingly, until a meta element with, say,
    > >"windows-1250" was encountered, and then _restarted_ with the new encoding
    > >in effect (implicit in this is that it should be compatible with the
    > >transport encoding up to the meta element).
    >
    >No, the intent was, that _servers_ parse the HTML document and send the
    >correct Content-Type: header, HTML 4 even says so.
    
    Where? Is that a must? It was planned that way, but it turned out
    that it was too complicated to do that on the server, and too
    much performance hit.
    
    
    > >This obviously does not consider HTTP defaulting behaviour, but even
    > >[RFC 2854] still says that ISO-8859-1 is the default.
    >
    >It says what HTTP/1.1 says, it doesn't define any default value for the
    >charset parameter but it does point at section 5.2 of HTML 4.
    
    Yes. Section 5.2 of HTML 4 is closest to current practice, and
    that's what the validator is following (or trying to follow).
    
    
    >[1] I think the http-equiv attribute is the worst thing ever
    >     incorporated into HTML. It hasn't been implemented, it beeing
    >     abused, semantics aren't clearly defined, the definition is
    >     ambigious, only a small number of people put syntactically valid
    >     information in the content attribute for some HTTP headers, etc.pp.
    >     I'll find some evil hellcat to put even more evil spells on the HTML
    >     WG members if this attribute won't be kicked out of XHTML 2.0 (or
    >     replaced by something with value) };-)
    
    This is easy to guess. XHTML 2.0 will use the XML 'encoding' pseudo-attribute.
    
    
    Regards,    Martin.