- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Sun, 17 Feb 2008 22:08:54 +0100
- To: public-html-comments@w3.org
Henri Sivonen wrote: > Validator.nu checks the combination of the protocol > entity body and the Content-Type header. Pretending > that Content-Type didn't matter wouldn't make sense > when it does make a difference in terms of processing > in a browser. I checked if the W3C validator servers still claim that application/xml-external-parsed-entity is chemical/x-pdb This was either fixed, or it is an intermittent problem, therefore I can continue my I18N tests today. XHTML 1 like HTML 4 wants URIs in links. For experiments with IRIs I created a homebrewn XHTML 1 i18n document type. Actually the same syntax renaming URI to IRI everywhere, updating RFC 2396 + 3066 to 3987 + 4646 in DTD comments, with absolute links to some entity files hosted by the W3C validator - that caused the chemical/x-pdb trouble. To get some results related to the *content* of my test files I have to set three options explicitly: * Be "lax" about HTTP content - whatever that is, XHTML 1 does not really say "anything goes", but validator.nu apparently considers obscure "advocacy" pages instead of the official XHTML 1 specification as "normative". * Parser "XML; load external entities" - whatever it is, validator.nu cannot handle the <?xml etc. intro for XHTML 1 otherwise. But that is required depending on the charset, and certainly always allowed for XHTML 1. * Preset "XHTML 1 transitional" - actually the test is not realy XHTML 1 transitional, but a uses a homebrewn XHTML 1 i18n DTD, but maybe that's beside the point for a validator not supporting DTDs to start with. With those three explicitly set options it could finally report that my test page is "valid" XHTML 1 transitional. But it's *not*, it uses real IRIs in places where only URIs are allowed, a major security flaw in DTD based validators: <http://omniplex.blogspot.com/2007/11/broken-validators.html> I know why DTD validators have issues to check URI syntax, it's beyond me why schema validators don't get this right, IMO "get something better than CDATA for attribute types" is the point of not using DTDs. And "can do STD 66 syntax for URIs", a full Internet Standard, is the very minimum I'd expect from something claiming to be better than DTDs. The broken URIs starting "calc" (on XP with installed IE7) from various applications were a hot topic for some months in 2007 until Adobe, Mozilla, MS, etc. finally arrived at the conclusion that the question whose fault that was isn't relevant. If all parties simply follow STD 66 it is okay. Four more related XHTML 1 I18N tests likely can't fly with validator.nu not supporting the (very) basic idea of DTDs, out of curiosity I tried it anyway: | Warning: XML processors are required to support the UTF-8 | and UTF-16 character encodings. The encoding was KOI8-R | instead, which is an incompatibility risk. Untested, I hope US-ASCII wouldn't trigger this warning, as a mobile-ok prototype did some months ago (and maybe still does). Validator.nu accepts U-labels (UTF-8) in system identifiers, W3C validator doesn't, and I also think they aren't allowed in XML 1.0 (all editions). Martin suggested they are okay, see <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5279>. Validator.nu rejects percent encoded UTF-8 labels in system identifiers, like the W3C validator. I think that is okay, *unless* you believe in a non-DNS STD 66 <reg-name>, where it might be syntactically okay. Hard to decide, potentially a bug <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5280>. [back to the general "HTML5 considered hostile to users"] > What are you trying to achieve? As mentioned about ten times in this thread I typically try to validate content, as author of the relevant document, or in a position to edit (in)valid documents. The complete number of HTTP servers under my control at this second (counting servers where I can edit dot-files used as configuration files by a popular server) is *zero*. That is a perfectly normal scenario for many authors and editors. Of course I'm not happy if files are served as chemical/x-pdb or similar crap, but it is outside my sphere of influence, and not what I'm interested in when I want to know what *I* did to make the overall picture worse *within* documents edited by me. Of course mediawiki *could* translate IRIs to equivalent URIs when it claims to produce XHTML 1 transitional, etc., just to mention another example. They are IMO in an ideal position to do this on the fly, for compatibility with almost all browsers, and IRIs are designed to have equivalent URIs. Where "outside my sphere of influence" is negotiable, e.g. I'd have reported chemical/x-pdb as bug today, but it was already fixed. My "plan B" was to use the "official" absolute URIs on a W3C server instead of the validator's SGML library, "plan C" would be to copy these files and put them on the same server as the homebrewn DTD. While googlepages won't try chemical/x-pdb I fear they'll never support the correct type for *.ent files, that is rather obscure. > Are you trying to check that your Web content doesn't have > obvious technical problems? Normally, yes. Of course we are discussing mainly my validator torture test pages, intentionally *unnormal* pages. I don't use HTML 2 strict or HTML i18n elsewhere, I don't use "raw" IRIs on "normal" XHTML 1 transitional pages because I know it's invalid, I use obscure colour names in legacy markup working more or less with any browser only on a single test page, and when you find *hundreds* of "&" instead of "&" on my blogger page this is no test, but a blogger bug, and I reported it months ago. Maybe they don't care, or are busy with other stuff like "open-id", or the most likely case: For products with thousands of users such bug reports NEVER reach developers, because they are filtered by folks drilled to suppress^H^H^H^Hort technically clueless users. > Or are you just trying to game a tool to say that your page is > valid Rarely. I use image-links hidden by span within pre on one page, at some point in time validators will tell me that this is a hack, no matter if it works with all browsers I've ever tested. Sanity check with validator.nu: Your tool says that this is an error. Maybe HTML5 could permit it, I'm not hot about it unless somebody produces a browser where this horribly fails. > Why are you validating pages? To find bugs. And for some years I used the W3C validator and its mailing list also as a way to learn XHTML 1 above a level offered by an O'Reilly book. Until I could read DTDs, read the XML spec. often enough for a vague impression, and figured relevant parts of the HTML history out. Using a legacy "3.2" browser also helped. Frank
Received on Sunday, 17 February 2008 21:15:15 UTC