- From: olivier Thereaux <ot@w3.org>
- Date: Fri, 6 Apr 2007 09:06:41 +0900
- To: Shane McCarron <shane@aptest.com>
- Cc: www-validator@w3.org
Hello, Lots of good comments in the thread, I hope it's OK if I add a few. On Apr 6, 2007, at 00:27 , Shane McCarron wrote: > > I am developing some new content for the XHTML 2 working group at > http://www.w3.org/MarkUp/Drafts/Overview.html > > When attempting to validate this, the validator complains that the > referenced DOCTYPE, admittedly a private, contrived markup language > that the validator does not know about explicitly, is being served > as text/html and it cannot decide how to parse it. Indeed, it can't. [ Between the time I tested and the time I write this mail the server setup has changed, and the validator now gets a 300 Multiple Choices back... I think the changes you made to the server setup does not take into consideration the fact that accept headers are optional. This case is similar to the following test case in our test suite: http://qa-dev.w3.org/wmvs/HEAD/dev/tests/rddl_fpi.html ] As a content author, I'm very happy with XHTML 1.0. As an implementor, however, it's been a bane. It used to be that the text/ html mime type was a clear indication of what to do with the document. It would be an sgml-ish application, and, in the real world, there would probably be some fairly pathological markup out there. XHTML1.0 changed that drastically, because it added a very different type of markup, a very different parsing model, within a single internet media type. AppC tried to limit the breakage, but IMHO it was too late. As far as the validator is concerned, it's been a cause for complicated code and logic. Most mime types trigger an XML mode, whereas text/html triggers a "TBD" mode that is only disambiguated after a few, *sigh*, sniffing steps. Doctype sniffing, mostly, but as your example shows, it's no panacea. In the case of: * a document served as text/html * with a doctype not known to the validator ... the validator will (at least in the version I've been working on lately, not sure about older versions) treat as SGML. The choice has to be made, and this choice assumes that most "strange" or "custom" doctypes will be sgml-based, is based on the fact that the only XML document type that may be acceptable as text.html is XHTML 1.0, and more generally hopes to push creators of custom XML doctypes to serve their documents with an XML mime type. > Now... I know that I could change the rules so that a requestor who > accepts application/xhtml+xml would get the document with that > media type, but... I think of the document starts off with an > "xml" declaration, e.g., <?xml ... ?>, then the validator should > assume that the document is to be validated in XML mode. Not that simple I'm afraid. If Ian Hickson were around to read this thread, he would certainly point out: * that <?xml .... ?> is a perfectly legit SGML PI * ... and would then tell you about the (pathological) case of http://www.damowmow.com/playground/html-not-xml-2.html I am not saying that this insane case is the rule, but it is worth pointing out that because of the situation with text/html and XHTML 1.0 Appendix C, there is NO unambiguous way of switching between SGML and XML mode. It's all sniffing, and heuristic at that. Pointer to some related (and really interesting) discussion between our validator developers: http://lists.w3.org/Archives/Public/public-qa-dev/2004Sep/0025.html ( and following messages in that thread ) Plus: http://www.w3.org/Bugs/Public/show_bug.cgi?id=14 -- olivier
Received on Friday, 6 April 2007 00:06:48 UTC