Re: charset parameter

From: Martin Duerst (duerst@w3.org)
Date: Thu, Jul 26 2001

  • Next message: Martin Duerst: "Re: charset parameter"

    Message-Id: <4.2.0.58.J.20010727113750.05fe5440@sh.w3.mag.keio.ac.jp>
    Date: Fri, 27 Jul 2001 11:42:55 +0900
    To: Terje Bless <link@pobox.com>, Nick Kew <nick@webthing.com>
    From: Martin Duerst <duerst@w3.org>
    Cc: W3C Validator <www-validator@w3.org>
    Subject: Re: charset parameter
    
    At 11:36 01/07/26 +0200, Terje Bless wrote:
    >On 26.07.01 at 09:08, Nick Kew <nick@webthing.com> wrote:
    >
    > >Surely that at least is clear: [HTTP] takes precedence over [META]?
    >
    >Nope. HTTP 1.1 doesn't mention META, and HTML just sez it's supposed to be
    >read by _servers_ to initialize the HTTP header... :-(
    
    Sorry, this is wrong. Please everybody read
    http://www.w3.org/TR/REC-html40/charset.html#h-5.2 !
    
    
    
    > >>Or what this means for the case when the charset in the HTTP header is
    > >>there by inference (as a default, not explicitly)...
    > >
    > >But *ML rules don't apply to HTTP, so whence the conclusion that
    > >*anything* is implicit (as opposed to absent) in the headers?
    >
    >The lack of a "charset" parameter on the HTTP 1.1 "Content-Type" header
    >field means that you should assume it is there with a value of "ISO-889-1"
    >according to the HTTP 1.1 RFC. HTML doesn't specify a default (it actually
    >discourages it). But if HTTP overrides META, and the HTTP charset is only
    >there by default, does HTTP's default still override an explicitly inserted
    >META?
    
    By widespread current practice, as well as by the HTML 4 Rec, NO.
    
    
    >That is, if the META sez EUC-JP and HTTP implicitly defines ISO-8859-1 (by
    >being absent), does that really mean that we should use ISO-8859-1 (which
    >the user obviously does _not_ want) over EUC-JP (which s/he _does_ want)?
    
    Yes. The validator currently goes for EUC-JP, and that's the right thing.
    
    
    
    > >>or "I'm sorry, but I was unable to determine the Character Encoding based
    > >>on available information. Please make your Character Encoding explicit in
    > >>the HTTP headers".
    > >
    > >Except if HTTP happens to be FTP or file upload, and there is no header...
    >
    >Or a fragment pasted into the form (not finished yet)... Or...
    >
    >It must be dealt with, but these are sufficiantly fringe cases that we can
    >add exceptions for those. I think... :-)
    
    Yes, these must be treated differently. I think the right thing to do,
    for both fragment and file upload, is:
    
    - Take the explicit selection in the popup menu as if it were the transport
       information (e.g. HTTP).
    - For the rest, do as currently.
    
    Regards,   Martin.