- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Wed, 6 Feb 2008 00:58:00 +0200
- To: "Frank Ellermann" <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
- Cc: <public-html-comments@w3.org>
Disclaimer: Still not a WG response. On Feb 5, 2008, at 23:28, Frank Ellermann wrote: > > Henri Sivonen wrote: > >> I suggest sending feedback to the ISO/IEC FDIS 19757 committee. > > If necessary that's up to somebody knowing what "FDIS 19757" is, > I'm happy with getting the drift and limitations of the few DTDs > I care about. Validator.nu doesn't and won't do DTD-based validation. I prefer to say what Validator.nu is rather that what it isn't, but if I were to define it in terms of what it isn't, it most deliberately isn't a DTD- based validator. >> I'm not interested in delivering a tool to authors who try to >> make a point by authoring new HTML 2.0 or 3.2 content today. > > Right, and I'm not very interested in HTML at all where XHTML 1 > does what I want. But you are obviously interested in reviving > a kind of HTML, and SGML comments actually work in browsers I've > tested. http://ln.hixie.ch/?start=1137799947&count=1 > Truth in advertising - what you do is not really "HTML" > or XHTML, it's a new class of its own. I thought the About page was truthful. It says Validator.nu doesn't do DTD-based validation and has no SGML functionality whatsoever. [...] >> [DTD subset] >>> that's apparently not (yet) supported by http://validator.nu >>> and FWIW also in no browser I know. > >> If it isn't supported in any browser, it would be less useful >> if the validator didn't point out the problem, wouldn't it? > > The WDG validator showed a warning, good. The W3C validator > accepted <br /> as valid HTML for some time, that was ugly. > But validator.nu drops the ball for valid DTD subsets, bad. HTML5 parsing has no such thing as a valid DTD subset. The XML spec makes most DTD-processing optional. For XML, Validator.nu can be configured to skip external entities (the prudent and more compatible default) or to process external entities (usually rendering the results irrelevant to the Web). By design, Validator.nu cannot be configured to perform XML-based validation. > It's not "pointing out the problem", as the WDG warning does, > it's lost with some syntactically valid pre-HTML5 constructs. > > A clear error message would be "DTD subset not supported", or > if that's simply not allowed in XHTML 1 (dunno) say "invalid". The error conditions follow the HTML5 parsing spec without ascribing SGML meaning to syntax errors. [...] > | External encoding information specified a non-UTF-8/non-UTF-16 > | encoding (ISO-8859-1), but there was no matching internal > | encoding declaration. > > PURL redirects to www.xyzzy.claranet, unfortunately that now > redirects again to home.claranet.de/xyzzy, and there I get an > (erroneous) Content-Type: text/html; charset=ISO-8859-1 > > Everybody and his dog knows that many authors cannot fix weird > ideas of HTTP servers. The impression that I get is that the TAG and the HTTP WG aren't part of "everyone and his dog". > IMO (X)HTML validators should completely ignore the HTTP layer and > focus on the job at hand, report only > issues *within* a document hoping to be valid (X)HTML. Offering > HTTP consistency checks as *option* (opt-in) is of course fine. Validator.nu is a quality assurance tool. It would be silly for it not to point out potential actual problems and instead focus on the historic fiction that text/html were parsed as SGML. (If the HTTP layer and the internal encoding declaration disagree, chances are the HTTP layer, which is authoritative, is wrong because the declaration closer to content is more likely to be right. That's a potential actual problem. Hence, a warning.) > That's of course no special validator.nu issue, other validators > also mix different layers (transport and content) into a rather > confusing (for ordinary users) mess. I might be persuaded to ignore Content-Type if you can get the TAG to repeal mime-respect and the IETF HTTP WG to endorse content sniffing and to deprecate Content-Type. >> http://hixie.ch/advocacy/xhtml > > It won't surprise anybody that I disregard this text, [...] I don't, and Validator.nu is programmed accordingly. More to the point, the parser selection in Validator.nu follows browser reality. [...] >> HEAD /sgml-lib/REC-xhtml1-20020801/xhtml-lat1.ent HTTP/1.1 > [...] >> Host: validator.w3.org > [...] >> Content-Type: chemical/x-pdb > > Brilliant, three involved servers (claranet, Google, W3C), and > none of them gets the relevant content types right. It never > occured to me that validator.w3.org could be a problem, I only > checked claranet and googlepages... :-( I'm going to make the error message identify the URL of the HTTP resource. > As noted above, just ignore what HTTP servers say, all you get > are mad lies, resulting in hopelessly confusing error messages > about issues not under the control of the tester. That's like people on www-validator complaining that their invalid ad serving boilerplate is not under their control. Making the references to a misconfigured server is under your control. The role of Validator.nu is to point out stuff like this without regard to whose configuration mistake it is. [...] > Or I confused it with the xml:lang errors, all my pages have > xml:lang alongside 'lang' everywhere, as required. Reporting > this as error is wrong for XHTML 1. If you use the XHTML 1.0 schema with the XML parser, this isn't reported as an error. If you use the XHTML 1.0 with the HTML5 parser, it is. If the latter combination is chosen automatically, it even tells you about this situation. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Tuesday, 5 February 2008 22:58:17 UTC