- From: Nick Kew <nick@webthing.com>
- Date: Fri, 6 Aug 2004 20:28:03 +0100 (BST)
- To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
- Cc: clong@itlnet.net, www-validator@w3.org
On Fri, 6 Aug 2004, Jukka K. Korpela wrote: > So the validator is unable to process even the DTD correctly. > I guess the same happens on the W3C validator, with much worse > error recovery. Or rather, errors in the DTD are suppressed in the report. Bear in mind that failing to do so can have unfortunate side-effects, such as confusing users by reporting four warnings in the HTML 4.0 DTD (corrected in 4.01). > And in fact > <http://www.htmlhelp.com/cgi-bin/validate.cgi? > url=http%3A%2F%2Fwww.billnchimene.com%2Findex.html&warnings=yes&xml=yes> > tells that the document passes validation. That's the key observation. The XML flags causes the parser to deal with XML syntax (subject to some known limitations). > Apparently the problem is that a validator needs to be told, or it needs > to guess, whether it is performing the job of an SGML validator or the job > of an XML validator. Indeed. The HTTP headers tell it that. > With predefined, catalogued DTDs, they presumably use > the FPI or the URL to resolve this. Nope. Well, yes, there's Appendix C which b*****s up believing the headers, but that's a specific exception that can be detected by matching specific strings. > But my analysis might be partly wrong. This is all very confusing, since > validators, believed to perform a well-defined rigorous check, actually > play fast and loose and "heuristically". Appendix C is the spec playing fast and loose, not the validator. In this case, it simply took on trust that the document was HTML. Based on that it got a DTD that doesn't parse. The error recovery adopted here appears to be fallback to a default. How best to report that is indeed an issue: since you regularly complain of confusing messages, perhaps you'd like to suggest a fix? Page Valet does the same, but gives an additional system message alerting the user to the mismatch between the HTML claim and the XML document. That's my best stab at the problem, but could also doubtless benefit from further improvement. -- Nick Kew
Received on Friday, 6 August 2004 15:28:59 UTC