- From: Terje Bless <link@tss.no>
- Date: Sat, 9 Oct 1999 03:46:47 +0200
- To: W3C Validator <www-validator@w3.org>
- cc: Uriel Wittenberg <uw@urielw.com>
On 08.10.99 at 13:33, Uriel Wittenberg <uw@urielw.com> wrote: >what does the validator validate, beyond adherence to the DTD? Nothing. When called, it will check the CGI parameters you give it for basic sanity and syntactic validity and do some minor URL rewrites to save time (if the host name starts with "www" you can skip the preceeding "http://" and a trailing slash will be appended if there aren't any in the URL given). It will then fetch the document using HTTP, proxying any atuthentication requests, and report errors if any occured. The Content-Type HTTP header field is extracted from the response and any unknown types will trigger an error message. Currently known content types are "text/html", "text/xml", "image/svg", "application/smil", "application/xml". Those content types are then coerced into one of two main types of documents: "html" and "xml". If the document is "xml", it will be totally self contained because in XML a proper document type is mandatory. If the document type is "html", it will be checked for the presense of a DOCTYPE. If one cannot be found, the validator will guess based on the elements used in the file. This is where you are getting confused. You can't validate a SGML file unless it contains a DOCTYPE, so the validator attempts to guess. The alternative is to refuse to even attempt to validate the file if it lacks a DOCTYPE. While this would be arguably "correct" behaviour and perfectly justified, it's not very user friendly. While it will validate against the guessed DTD, it will warn you about it and it won't label the document as valid unless it contains a DOCTYPE. This is because the DOCTYPE isn't a requirement of the DTD, but rather a requirement of SGML. While we usually talk about validity as a function of the DTD; it still needs to be valid SGML because HTML is an application of SGML. Guessing the DOCTYPE is a feature to make the validator more user friendly and does in no way affect the actual validation process. There are two things to keep in mind here. The first what I mentioned above: all HTML files must also be valid SGML files because HTML is an application of SGML. The second is that the HTML 4.0 Reccomendation imposes additional constraints om HTML that cannot be expressed in SGML. It is thus possible to have a valid SGML file, which passes the validator, but which is *not* valid HTML 4.0. The reverse is not true: any HTML file that is valid according to the HTML 4.0 Reccomendation will also pass validation. The validator has no other criteria it uses for validation. It does have a few limitations (character set conversion comes to mind) and it does offer to run weblint on the document for you. Weblint is a fluff checker and has nothing to do with actual validation (it uses arbitrary criteria for what it considers "good" HTML). This is the closest thing to what you suggest. Does that answer your question? :-)
Received on Friday, 8 October 1999 22:43:38 UTC