Re: Validator complains because it cannot determine validation mode from document type from olivier Thereaux on 2007-04-06 (www-validator@w3.org from April 2007)

From: olivier Thereaux <ot@w3.org>
Date: Fri, 6 Apr 2007 09:06:41 +0900
To: Shane McCarron <shane@aptest.com>
Cc: www-validator@w3.org
Message-Id: <A3ED5120-0441-45C9-B38E-C658368E36EC@w3.org>

Hello,

Lots of good comments in the thread, I hope it's OK if I add a few.

On Apr 6, 2007, at 00:27 , Shane McCarron wrote:

>
> I am developing some new content for the XHTML 2 working group at  
> http://www.w3.org/MarkUp/Drafts/Overview.html
>
> When attempting to validate this, the validator complains that the  
> referenced DOCTYPE, admittedly a private, contrived markup language  
> that the validator does not know about explicitly, is being served  
> as text/html and it cannot decide how to parse it.

Indeed, it can't.

[
Between the time I tested and the time I write this mail the server  
setup has changed, and the validator now gets a 300 Multiple Choices  
back... I think the changes you made to the server setup does not  
take into consideration the fact that accept headers are optional.

This case is similar to the following test case in our test suite:
http://qa-dev.w3.org/wmvs/HEAD/dev/tests/rddl_fpi.html
]

As a content author, I'm very happy with XHTML 1.0. As an  
implementor, however, it's been a bane. It used to be that the text/ 
html mime type was a clear indication of what to do with the  
document. It would be an sgml-ish application, and, in the real  
world, there would probably be some fairly pathological markup out  
there. XHTML1.0 changed that drastically, because it added a very  
different type of markup, a very different parsing model, within a  
single internet media type. AppC tried to limit the breakage, but  
IMHO it was too late.

As far as the validator is concerned, it's been a cause for  
complicated code and logic. Most mime types trigger an XML mode,  
whereas text/html triggers a "TBD" mode that is only disambiguated  
after a few, *sigh*, sniffing steps. Doctype sniffing, mostly, but as  
your example shows, it's no panacea.

In the case of:
* a document served as text/html
* with a doctype not known to the validator
... the validator will (at least in the version I've been working on  
lately, not sure about older versions) treat as SGML. The choice has  
to be made, and this choice assumes that most "strange" or "custom"  
doctypes will be sgml-based, is based on the fact that the only XML  
document type that may be acceptable as text.html is XHTML 1.0, and  
more generally hopes to push creators of custom XML doctypes to serve  
their documents with an XML mime type.

> Now... I know that I could change the rules so that a requestor who  
> accepts application/xhtml+xml would get the document with that  
> media type, but...  I think of the document starts off with an  
> "xml" declaration, e.g., <?xml ... ?>, then the validator should  
> assume that the document is to be validated in XML mode.

Not that simple I'm afraid.
If Ian Hickson were around to read this thread, he would certainly  
point out:

* that <?xml .... ?> is a perfectly legit SGML PI

* ... and would then tell you about the (pathological) case of
http://www.damowmow.com/playground/html-not-xml-2.html

I am not saying that this insane case is the rule, but it is worth  
pointing out that because of the situation with text/html and XHTML  
1.0 Appendix C, there is NO unambiguous way of switching between SGML  
and XML mode. It's all sniffing, and heuristic at that.

Pointer to some related (and really interesting) discussion between  
our validator developers:
http://lists.w3.org/Archives/Public/public-qa-dev/2004Sep/0025.html
( and following messages in that thread )
Plus: http://www.w3.org/Bugs/Public/show_bug.cgi?id=14

-- 
olivier

Received on Friday, 6 April 2007 00:06:48 UTC