Re: 2 bugs in the W3 html validator service

From: Terje Bless (link@pobox.com)
Date: Fri, Sep 28 2001

  • Next message: Einar Westermann: "Re: 2 bugs in the W3 html validator service"

    From: Terje Bless <link@pobox.com>
    To: Brent Boyer <brentboyer@hotmail.com>
    Cc: W3C Validator <www-validator@w3.org>
    Date: 28 Sep 2001 19:22:59 +0200
    Message-Id: <1001697780.2571.50.camel@tux>
    Subject: Re: 2 bugs in the W3 html validator service
    
    On Fri, 2001-09-28 at 18:38, Brent Boyer wrote:
    
    > First, thank you VERY much for your prompt reply.
    
    We aims to please... :-)
    
    
    >>This may be considered a bug in the old version that has been fixed in
    >>the new version.
    >
    >Now that is interesting!
    
    The issue is simply that previous versions would blithely assume a
    default character encoding if one was not given -- Warning! Danger, Will
    Robinson! -- where the new version will refuse to perform the operation
    if it doesn't have all the information it needs. You may set an explicit
    character encoding using the form, or override an existing encoding, but
    you will then get a warning about this and the Validator won't return a
    "Valid HTML" badge on the results page.
    
    It's been some time now since ISO-8859-1 was an acceptable default on
    the web, and that particular practice was actively harmful to the web
    community both outside the ISO-8859-1 "zone" and _inside_ it!
    
    
    >>The two meta tags form is not a valid way to express the character
    >>encoding. If you have any reference for this usage I would be *most*
    >>interested to see it!
    >
    >I cannot remember which website I read that recommended it, but if you
    >do a search in google with the following keywords
    >    meta http-equiv="charset" content=
    >
    >you get TONS of results.  For instance, on just the first page of
    >results that I have bothered to look at, you get the following advice
    >from the very popular about.com website:
    >
    >  Some common name/value pairs sent through the http-equiv meta tag:
    >
    >       http-equiv="charset" content="iso-8859-1"
    >       This defines the language used on the page
    >
    >(see http://html.about.com/library/weekly/aa111699.htm)
    >
    >Can you cite for me a reference which explicitly states that the 1
    >meta tag form is the only correct way?
    >
    >(I looked for http-equiv="charset" on the w3 site, and could find
    >nothing, and their examples, e.g.
    >    http://www.w3.org/International/O-charset.html
    >
    >do only seem to use the 1 tag form.  But do you know where they
    >explicitly state that the 1 tag form is the only valid one?  And
    >how did so many people out there get the idea of using the 2 tag
    >form?)
    
    I'm afraid I have no idea where the misconception stems from -- perhaps
    it is a proprietary feature of Internet Explorer or Netscape that has
    gotten loose? It wouldn't be the first such. :-( -- but the reason
    you'll find no reference to the issue on w3.org is that you are trying
    to find the answer to the wrong question. :-)
    
    See, META with an "http-equiv" attribute is defined to specify
    additional or overriding _HTTP_ headers and has nothing much to do with
    HTML as such. If you look at the HTTP specification, you'll find that
    there is no header field named "Charset". The character encoding is
    specified with the "charset" attribute on the Content-Type field.
    
    There is nothing wrong with your HTML; the META tag does define a HTTP
    header, but the resulting header just doesn't have any defined meaning
    in the context of HTTP.
    
    
    You should seriously consider specifying the character encoding in HTTP
    in any case. Most servers make this easy, but not all server
    administrators are aware of it or, in some cases, make it easy for their
    users or customers to actually use this facility. The use of a META tag
    for this is sometimes unavoidable, but an ugly hack nonetheless. If
    there is any way you can avoid it I would strongly suggest you do so!
    
    
    >>While I can't help with the clarity issue, you should be able to fix it
    >>with a simple search and replace operation in any decent text editor or
    >>a tool specifically for the purpose (I can whip up some Perl for it in
    >>a couple of minutes if that'd help you any).
    >
    >Thanks for the offer, but I can fix it if I have to...
    
    As mentioned, you may wish to configure your server to send the
    appropriate character encoding information automatically. In Apache you
    can set a default character encoding globally or on a per-directory or
    per-file basis. I'm sure most servers have a similar facility. Then you
    could remove the META tags entirely and the results would be far more
    reliable (not to mention predictable).
    
    
    >>It definitely shouldn't do that. Could you describe in more detail how
    >>you triggered this bug?
    [ Edited for brevity...
    > 1. Goto URL http://validator.w3.org/
    > 2. Click on the "upload files" link [...]
    > 3. [Upload a file]
    [...]
    > 6. [On the results page, Click on] the "Revalidate" button, it does
    >    NOT revalidate the previously uploaded file, as I thought it
    >    would.  Instead, it takes you all the way back to the initial
    >    page, namely, http://validator.w3.org/
    
    Aha! I've got it now. It's a combination of two issues actually. The
    "Revalidate" feature does not work with uploaded files. This is a known
    issue. The other issue -- and this is a straight bug -- is that we don't
    disable the "Revalidate" options in this particular case. Mea Culpa! I
    should have fixed that long ago. Sorry.
    
    Revalidate for uploaded files is a feature we'de like to add, but right
    now I can't tell you whether that is feasible or, if so, when we could
    get around to it.
    
    We will fix that button though; hopefully someone will twack me over the
    head if I forget it on the next update. :-)
    
    
    >Please forward it on for me -- thank you.
    
    Will do.