Re: What is "validation"?

On 08.10.99 at 13:33, Uriel Wittenberg <uw@urielw.com> wrote:

>what does the validator validate, beyond adherence to the DTD?

Nothing.


When called, it will check the CGI parameters you give it for basic sanity
and syntactic validity and do some minor URL rewrites to save time (if the
host name starts with "www" you can skip the preceeding "http://" and a
trailing slash will be appended if there aren't any in the URL given).

It will then fetch the document using HTTP, proxying any atuthentication
requests, and report errors if any occured. The Content-Type HTTP header
field is extracted from the response and any unknown types will trigger an
error message. Currently known content types are "text/html", "text/xml",
"image/svg", "application/smil", "application/xml". Those content types are
then coerced into one of two main types of documents: "html" and "xml".

If the document is "xml", it will be totally self contained because in XML
a proper document type is mandatory. If the document type is "html", it
will be checked for the presense of a DOCTYPE. If one cannot be found, the
validator will guess based on the elements used in the file.


This is where you are getting confused. You can't validate a SGML file
unless it contains a DOCTYPE, so the validator attempts to guess. The
alternative is to refuse to even attempt to validate the file if it lacks a
DOCTYPE. While this would be arguably "correct" behaviour and perfectly
justified, it's not very user friendly.

While it will validate against the guessed DTD, it will warn you about it
and it won't label the document as valid unless it contains a DOCTYPE. This
is because the DOCTYPE isn't a requirement of the DTD, but rather a
requirement of SGML. While we usually talk about validity as a function of
the DTD; it still needs to be valid SGML because HTML is an application of
SGML. Guessing the DOCTYPE is a feature to make the validator more user
friendly and does in no way affect the actual validation process.


There are two things to keep in mind here. The first what I mentioned
above: all HTML files must also be valid SGML files because HTML is an
application of SGML. The second is that the HTML 4.0 Reccomendation imposes
additional constraints om HTML that cannot be expressed in SGML. It is thus
possible to have a valid SGML file, which passes the validator, but which
is *not* valid HTML 4.0. The reverse is not true: any HTML file that is
valid according to the HTML 4.0 Reccomendation will also pass validation.


The validator has no other criteria it uses for validation. It does have a
few limitations (character set conversion comes to mind) and it does offer
to run weblint on the document for you. Weblint is a fluff checker and has
nothing to do with actual validation (it uses arbitrary criteria for what
it considers "good" HTML). This is the closest thing to what you suggest.


Does that answer your question? :-)

Received on Friday, 8 October 1999 22:43:38 UTC