- From: olivier Thereaux <ot@w3.org>
- Date: Mon, 30 Apr 2007 11:53:03 -0400
- To: Karl Dubost <karl@w3.org>
- Cc: Shane McCarron <shane@aptest.com>, www-validator@w3.org
On Apr 29, 2007, at 19:05 , Karl Dubost wrote: > Or if olivier gives me the steps that the validator follows now, I > could sketch up a diagram and we may have a better picture of how > it could work and if it should be modified or not. 1) First the validator takes the internet media type (mime type, content-type) (either given by the server, or by the browser in upload mode. In direct input mode, this step is skipped...) and compares it to its table (in the validator's config, look for <MIME> in http://dev.w3.org/cvsweb/~checkout~/validator/htdocs/config/ validator.conf ) This gives us the parse mode based on content type, which is either "XML" (for XML media types) or "TBD" (for text/html, because html and xhtml - two different parsing modes - can both be served with this media type. 2) Then the validator pre-parses the document to fetch its doctype (if any...) and compares it to a second table (see http://dev.w3.org/ cvsweb/~checkout~/validator/htdocs/config/types.conf ) 3) Now we have: - nothing if the document was sent by direct input and has an unknown doctype, or no doctype - one or two determined parse modes if either mime type of doctype are known to us. 4) We finally follow the algorithm: - if the parse mode defined with the mime type is unambiguous, we use that - else, if the parse mode defined with the doctype is know, we use that - else, we fall back to SGML/HTML mode (plus some warning shooting if the determined parse modes clash) (see routine set_parse_mode() in http://dev.w3.org/cvsweb/~checkout~/ validator/httpd/cgi-bin/check ) ** If I understand it correctly, Shane's suggestion would be to add to step 2) the following: if the document type is not in our table of know document types, but the public identifier matches ^-//W3C//DTD XHTML then the parse mode determined by the document type should be XML, because we then know that the document type is in the XHTML family, even if we don't know everything about it. This sounds reasonable to me. Any objection? -- olivier
Received on Monday, 30 April 2007 15:53:03 UTC