- From: olivier Thereaux <ot@w3.org>
- Date: Tue, 20 Sep 2005 10:54:46 +0900
- To: mam@theory.Stanford.EDU
- Cc: www-validator@w3.org
Dear Mr. Knuth (or, by proxy, dear Ms. McLoughlin),
Thank you for your feedback on the Markup Validator.
As you noted in your message, this tool is managed by "system people
who are supposedly committed to helping the world's users from all
the various cultures". Indeed, the development and maintenance of
this validator is done mostly by a group of volunteer developers,
with the help of a user community here on the www-validator. This
community is working very hard to make this tool as good as possible,
and is dedicated to make the service useful and helpful for people
around the world.
The validator checks documents against the document type they claim
to be using. In an overwhelming number of cases, the document type is
a well-known standard, and the validator, which has a library (a
cache, so to speak) of all the formal public identifiers (e.g "-//
W3C//DTD HTML 4.01//EN") which allows for speedy validation without
needing to fetch the actual DTD.
That library had, however, become bloated and hard to maintain, which
made it a liability. The community behind the validator decided to
strike a balance by only keeping standardized document types in this
library. We did not, at the time, communicate much about this
decision, and as the person responsible for communication around this
project, I genuinely apologize for this.
We also made sure the validator would have proper support for other
doctype constructs, so that users of proprietary document types could
declare:
<!DOCTYPE html PUBLIC "-//MyOrg//MyDoc//EN" "http://www.example.org/
mydoc.dtd">
or
<!DOCTYPE html SYSTEM "http://www.example.org/mydoc.dtd">
which are the recommended ways of declaring the DOCTYPE when using a
non standard DTD. Using a non standard DTD with only:
<!DOCTYPE html PUBLIC "-//MyOrg//MyDoc//EN"> is technically
acceptable but dangerous, since there is no guarantee parsing agents
will know the public identifier.
This is the case your documents are falling into, as you are using:
<!DOCTYPE HTML PUBLIC "-//Netscape Comm. Corp.//DTD HTML//EN">
Ideally, you would not be using this document type. It is proprietary
and has never been standardized. As a matter of fact, as a long time
user of the validator, even before it was maintained at W3C, you
probably read the following:
[[
> However, please be aware that this DTD contains many elements which
> may never become standardized or widely supported.
]] -- from the documentation of the "kinder, gentler HTML validator"
How can we solve this problem?
Ideally, this would be an opportunity for you to switch your content
to an actually standard HTML version. Unlike the never-published (at
least I cannot find any published version of the DTD), never
standardized Netscape HTML, languages such as HTML 4.01 have been
through a standardizing process. That means that they have been
designed with concern for the needs of all, and that means that they
are here to stay.
How hard would that be? In the case of your documents, that's a
matter of three steps:
1- change the doctype declaration at the top of each documents to e.g:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
2- get rid of the absmiddle attribute. This attribute value is not in
standard HTML. But HTML 4.01 has valign="middle" which I gather has
the exact same effect. Better yet would be using CSS[1]: using HTML
as a markup-only language and using Cascading Style Sheets for the
style of your document is not only cleaner, easier to maintain, it is
also lighter and saves bandwidth.
[1] http://www.w3.org/TR/CSS2/
3- alt attributes for images. There is a good reason why the
standardized HTML requires alt attributes for images where the
proprietary netscape HTML does not: accessibility. A person visiting
your Web site with a screen reader, for instance, would not be able
to know what the images are. Actually, your documents already use
such information, albeit not consistently. Going through your content
and adding descriptions for images with meaning, and then running the
tool tidy [2] with the alt-text option could take care of all the
images that are purely presentational and set an empty alt text for
them.
[2] tidy.sourceforge.net/
Except for the setting of alternate text for the images, which would
be a favor to the quality of your content anyway, all these
operations can be automated in the matter of a few lines of code in
whichever text-processing language you fancy. This should not take a
week.
Alternatively, if you are really willing to keep using the
nonstandard DTD, the validator community, which cares a lot about the
quality this service offers, could consider re-adding it in the
validator's catalog for a future release of the tool.
I honestly do not believe that this would be a winning situation for
anyone. You say you have been a long time user of the validator, and
I am hopeful that this means you care about your content being
correctly written in a properly defined language. If that is the
case, then I trust you will see the interest in switching your Web
site to a standard language such as HTML 4.01.
Regards,
olivier
--
olivier Thereaux - W3C - http://www.w3.org/People/olivier/
W3C Open Source Software: http://www.w3.org/Status
Received on Tuesday, 20 September 2005 01:55:03 UTC