Re: note from Prof Knuth

Dear Mr. Knuth (or, by proxy, dear Ms. McLoughlin),

Thank you for your feedback on the Markup Validator.

As you noted in your message, this tool is managed by "system people  
who are supposedly committed to helping the world's users from all  
the various cultures". Indeed, the development and maintenance of  
this validator is done mostly by a group of volunteer developers,  
with the help of a user community here on the www-validator. This  
community is working very hard to make this tool as good as possible,  
and is dedicated to make the service useful and helpful for people  
around the world.

The validator checks documents against the document type they claim  
to be using. In an overwhelming number of cases, the document type is  
a well-known standard, and the validator, which has a library (a  
cache, so to speak) of all the formal public identifiers (e.g "-// 
W3C//DTD HTML 4.01//EN") which allows for speedy validation without  
needing to fetch the actual DTD.

That library had, however, become bloated and hard to maintain, which  
made it a liability. The community behind the validator decided to  
strike a balance by only keeping standardized document types in this  
library. We did not, at the time, communicate much about this  
decision, and as the person responsible for communication around this  
project, I genuinely apologize for this.

We also made sure the validator would have proper support for other  
doctype constructs, so that users of proprietary document types could  
declare:
<!DOCTYPE html PUBLIC "-//MyOrg//MyDoc//EN" "http://www.example.org/ 
mydoc.dtd">
or
<!DOCTYPE html SYSTEM "http://www.example.org/mydoc.dtd">
which are the recommended ways of declaring the DOCTYPE when using a  
non standard DTD. Using a non standard DTD with only:
<!DOCTYPE html PUBLIC "-//MyOrg//MyDoc//EN"> is technically  
acceptable but dangerous, since there is no guarantee parsing agents  
will know the public identifier.

This is the case your documents are falling into, as you are using:
<!DOCTYPE HTML PUBLIC "-//Netscape Comm. Corp.//DTD HTML//EN">

Ideally, you would not be using this document type. It is proprietary  
and has never been standardized. As a matter of fact, as a long time  
user of the validator, even before it was maintained at W3C, you  
probably read the following:
[[
>    However, please be aware that this DTD contains many elements which
>    may never become standardized or widely supported.
]] -- from the documentation of the "kinder, gentler HTML validator"

How can we solve this problem?

Ideally, this would be an opportunity for you to switch your content  
to an actually standard HTML version. Unlike the never-published (at  
least I cannot find any published version of the DTD), never  
standardized Netscape HTML, languages such as HTML 4.01 have been  
through a standardizing process. That means that they have been  
designed with concern for the needs of all, and that means that they  
are here to stay.

How hard would that be? In the case of your documents, that's a  
matter of three steps:

1- change the doctype declaration at the top of each documents to e.g:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
    "http://www.w3.org/TR/html4/loose.dtd">

2- get rid of the absmiddle attribute. This attribute value is not in  
standard HTML. But HTML 4.01 has valign="middle" which I gather has  
the exact same effect. Better yet would be using CSS[1]: using HTML  
as a markup-only language and using Cascading Style Sheets for the  
style of your document is not only cleaner, easier to maintain, it is  
also lighter and saves bandwidth.

[1] http://www.w3.org/TR/CSS2/

3- alt attributes for images. There is a good reason why the  
standardized HTML requires alt attributes for images where the  
proprietary netscape HTML does not: accessibility. A person visiting  
your Web site with a screen reader, for instance, would not be able  
to know what the images are. Actually, your documents already use  
such information, albeit not consistently. Going through your content  
and adding descriptions for images with meaning, and then running the  
tool tidy [2] with the alt-text option could take care of all the  
images that are purely presentational and set an empty alt text for  
them.

[2] tidy.sourceforge.net/

Except for the setting of alternate text for the images, which would  
be a favor to the quality of your content anyway, all these  
operations can be automated in the matter of a few lines of code in  
whichever text-processing language you fancy. This should not take a  
week.


Alternatively, if you are really willing to keep using the  
nonstandard DTD, the validator community, which cares a lot about the  
quality this service offers, could consider re-adding it in the  
validator's catalog for a future release of the tool.

I honestly do not believe that this would be a winning situation for  
anyone. You say you have been a long time user of the validator, and  
I am hopeful that this means you care about your content being  
correctly written in a properly defined language. If that is the  
case, then I trust you will see the interest in switching your Web  
site to a standard language such as HTML 4.01.

Regards,
olivier
-- 
olivier Thereaux - W3C - http://www.w3.org/People/olivier/
W3C Open Source Software: http://www.w3.org/Status

Received on Tuesday, 20 September 2005 01:55:03 UTC