Re: 2 bugs in the W3 html validator service

On Fri, 2001-09-28 at 18:38, Brent Boyer wrote:

> First, thank you VERY much for your prompt reply.

We aims to please... :-)


>>This may be considered a bug in the old version that has been fixed in
>>the new version.
>
>Now that is interesting!

The issue is simply that previous versions would blithely assume a
default character encoding if one was not given -- Warning! Danger, Will
Robinson! -- where the new version will refuse to perform the operation
if it doesn't have all the information it needs. You may set an explicit
character encoding using the form, or override an existing encoding, but
you will then get a warning about this and the Validator won't return a
"Valid HTML" badge on the results page.

It's been some time now since ISO-8859-1 was an acceptable default on
the web, and that particular practice was actively harmful to the web
community both outside the ISO-8859-1 "zone" and _inside_ it!


>>The two meta tags form is not a valid way to express the character
>>encoding. If you have any reference for this usage I would be *most*
>>interested to see it!
>
>I cannot remember which website I read that recommended it, but if you
>do a search in google with the following keywords
>    meta http-equiv="charset" content=
>
>you get TONS of results.  For instance, on just the first page of
>results that I have bothered to look at, you get the following advice
>from the very popular about.com website:
>
>  Some common name/value pairs sent through the http-equiv meta tag:
>
>       http-equiv="charset" content="iso-8859-1"
>       This defines the language used on the page
>
>(see http://html.about.com/library/weekly/aa111699.htm)
>
>Can you cite for me a reference which explicitly states that the 1
>meta tag form is the only correct way?
>
>(I looked for http-equiv="charset" on the w3 site, and could find
>nothing, and their examples, e.g.
>    http://www.w3.org/International/O-charset.html
>
>do only seem to use the 1 tag form.  But do you know where they
>explicitly state that the 1 tag form is the only valid one?  And
>how did so many people out there get the idea of using the 2 tag
>form?)

I'm afraid I have no idea where the misconception stems from -- perhaps
it is a proprietary feature of Internet Explorer or Netscape that has
gotten loose? It wouldn't be the first such. :-( -- but the reason
you'll find no reference to the issue on w3.org is that you are trying
to find the answer to the wrong question. :-)

See, META with an "http-equiv" attribute is defined to specify
additional or overriding _HTTP_ headers and has nothing much to do with
HTML as such. If you look at the HTTP specification, you'll find that
there is no header field named "Charset". The character encoding is
specified with the "charset" attribute on the Content-Type field.

There is nothing wrong with your HTML; the META tag does define a HTTP
header, but the resulting header just doesn't have any defined meaning
in the context of HTTP.


You should seriously consider specifying the character encoding in HTTP
in any case. Most servers make this easy, but not all server
administrators are aware of it or, in some cases, make it easy for their
users or customers to actually use this facility. The use of a META tag
for this is sometimes unavoidable, but an ugly hack nonetheless. If
there is any way you can avoid it I would strongly suggest you do so!


>>While I can't help with the clarity issue, you should be able to fix it
>>with a simple search and replace operation in any decent text editor or
>>a tool specifically for the purpose (I can whip up some Perl for it in
>>a couple of minutes if that'd help you any).
>
>Thanks for the offer, but I can fix it if I have to...

As mentioned, you may wish to configure your server to send the
appropriate character encoding information automatically. In Apache you
can set a default character encoding globally or on a per-directory or
per-file basis. I'm sure most servers have a similar facility. Then you
could remove the META tags entirely and the results would be far more
reliable (not to mention predictable).


>>It definitely shouldn't do that. Could you describe in more detail how
>>you triggered this bug?
[ Edited for brevity...
> 1. Goto URL http://validator.w3.org/
> 2. Click on the "upload files" link [...]
> 3. [Upload a file]
[...]
> 6. [On the results page, Click on] the "Revalidate" button, it does
>    NOT revalidate the previously uploaded file, as I thought it
>    would.  Instead, it takes you all the way back to the initial
>    page, namely, http://validator.w3.org/

Aha! I've got it now. It's a combination of two issues actually. The
"Revalidate" feature does not work with uploaded files. This is a known
issue. The other issue -- and this is a straight bug -- is that we don't
disable the "Revalidate" options in this particular case. Mea Culpa! I
should have fixed that long ago. Sorry.

Revalidate for uploaded files is a feature we'de like to add, but right
now I can't tell you whether that is feasible or, if so, when we could
get around to it.

We will fix that button though; hopefully someone will twack me over the
head if I forget it on the next update. :-)


>Please forward it on for me -- thank you.

Will do.

Received on Friday, 28 September 2001 13:23:02 UTC