- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Mon, 18 Feb 2008 22:52:03 +0100
- To: www-validator@w3.org
- Cc: www-archive@w3.org
Henri Sivonen wrote: > Disclaimer: Still not a WG response. This disclaimer business is odd. Is the WG supposed to agree on answers for all public comments ? Should I add disclaimers stating "speaking for myself as of today, and not necessarily still insisting on what I said a year ago" ? > Changed W3C list to www-archive, because this reply isn't > feedback about HTML 5. That is apparently a list to get mails on public record, not exactly for discussions, I add the W3C validator list, maybe remove wwww-archive. Thanks for reporting and fixing the *.ent issues on the W3C validator servers. > HTML 4.01 already defined IRI-compatible processing for the > path and query parts, so now that there are actual IRIs, > making Validator.nu complain about them doesn't seem > particularly productive. Getting IRIs outside of <reg-host> right is not too complex, for UTF-8 pages it is trivial. But HTML 4 based on RFC 2396 couldn't foresee how simple RFC 3987 is for these parts. However HTML 4 and RFC 2396 (1998) are seven years older than RFC 3987 (2005), and older implementations supporting HTML 4 won't get it right. In other words raw IRIs do *not* work on almost all browsers. The accessibility issue should be obvious. A few months ago FF2 failed to support <ipath> on pages with legacy charsets. It got the non-trivial <ihost> right, so far for one popular browser. Accessibility is not defined by "upgrade your browser", it has "syntactically valid" as premise. New "raw" IRIs in HTML 4 or XHTML 1 documents are not valid, they are no URLs as specified in RFC 3986, let alone 2396. Using "raw" IRIs on pages supposed to be "accessible" by some applicable laws is illegal for HTML 4 (or older) documents, it is also illegal for all XHTML 1 versions mirroring what HTML 4 does. Schema based validation not supporting URLs is really strange, if it is too complex use DTDs, maybe try to create a HTML5 DTD starting with the existing XHTML 1.1 modules. "Won't do URLs because they are not productive" is also odd, thousands of browsers and other tools can't handle "raw" IRIs. By design all "raw" IRIs can be transformed into proper URIs, it is unnecessary to break backwards compatibility if all URI producers do what the name says. After that you have a simple split in the spirit of MIME, old clients (URI consuments) get something they can handle, URLs, the deployment of IRIs can begin where it should start, at the stronger side (URI producers and servers). Forcing the weaker side to upgrade is just wrong, the stronger side has to begin with this job. Leave IE6 (etc.) alone until they vanished voluntarily in about ten years. If the IRI design would force us to do all or nothing it could be different. But that is not the case, IRIs are designed for a smooth transition by defining an equivalent URI for any IRI. >> Actually the same syntax renaming URI to IRI everywhere, >> updating RFC 2396 + 3066 to 3987 + 4646 in DTD comments, > That's a pointless exercise, because neither browsers nor > validators ascribe meaning to DTD comments or production > identifiers. Human users including judges deciding what "accessible" means and implementors deciding what URL means can see a difference. For implementors building schema validators I'd say that they MUST support URLs in a professional tool for various reasons related to security, accessibility, ethics, professionalism. The W3C validator has an excuse, it uses DTDs. But even here the issue is bug 4916 reported by Olivier 2007-08-07, at the time when various URL security flaws (XP + IE7) hit the fan. Related to ethics and professionalism, how should say Martin or Richard create (X)HTML test pages for IRIs, or submit an RFC 3987 implementation and interoperability report without an XHTML document type permitting to use "raw" IRIs ? HTML5 is still a draft and moving target at the moment, and using atom or xmpp (designed with IRIs in mind, no backwards compatibility issues) might be not what they need. [Back to validator.nu] >> * Be "lax" about HTTP content - whatever that is, XHTML 1 >> does not really say "anything goes", but validator.nu >> apparently considers obscure "advocacy" pages instead >> of the official XHTML 1 specification as "normative". > Validator.nu treats HTML 5 as normative and media type-based > dispatching in browsers as congruent de facto guidance. If you want congruent de facto guidance I trust that <embed> and friends will pass as "valid" without warning (untested). However when I want congruent de facto guidance I would use another browser, ideally Lynx, not a validator. Some weeks ago you quoted an ISO standard I haven't heard of before for your definition of "valid". If that ISO standard has "congruent de facto guidance" in its definition trash it or maybe put it where you have DIS 29500. > I've fixed the schema preset labeling to say "+ IRI". Good, documented bugs can be features. Somewhat suspicious, I hope your validator can parse and check IRIs based on the RFC 3986 syntax. There are some ugly holes in RFC 3987 wrt spaces and a few other characters. Spaces can cause havoc in space-separated URI lists and similar constructs. [legacy charsets] > US-ASCII and ISO-8859-1 (their preferred IANA names only) > don't trigger that warning, because I don't have evidence > of XML processors that didn't support those two in addition > to the required encodings. That's an odd decision, US-ASCII clearly is a "proper subset" of UTF-8 for any sound definition of "proper suset" in this field. But Latin-1 is no proper subset of the UTFs required by XML. (Arguably Latin-1 is a subset of "UTF-4", but that is mainly a theoretical construct and no registered charset). My intuitive interpretation of the HTML5 draft is that they are ready with Latin-1 and propose windows-1252 as its heir in the spirit of congruent de facto guidance. If you see no compelling use cases for NEL / SS2 / SS3 in ISO-8859-1 today you could ditch the ISO 6429 C1 controls, in the same way as UTF-8 replaced UTF-1 fourteen years ago.u+001b u+0045 >> Validator.nu accepts U-labels (UTF-8) in system identifiers, >> W3C validator doesn't, and I also think they aren't allowed >> in XML 1.0 (all editions). Martin suggested they are okay, >> see <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5279>. > Validator.nu URIfies system ids using the Jena IRI library > set to the XML system id mode. "Need some external library doing some obscure stuff" is IMO precisely the problem in XML 1.0 (3rd and 4th edition). > Considering that the XML spec clearly sought to allow IRIs > ahead of the IRI spec, would it be actually helpful to > change this even if a pedantic reading of specs suggested > that the host part should be in Punycode? Your interpretation differs from what I get: The XML spec. apparently does *not* suggest punycode, it is not based on RFC 3987, it invents its very own pseudo-IRIs for the system identifiers, requiring that clients use additional libraries to get external entities using these constructs. In other words a simple XML application using curl or wget or what else supporting URLs could *breaks. IMNSHO this is the attitude towards backwards compatibility which makes me think that "the W3C is hostile to users". Sometimes producing good stuff, but often with an ugly "upgrade your browser" catch 22. Forcing XML applications to support punycode based on Unicode 3.2 (for IDNA2003, the IDNA200x stuff is not yet ready) only to retrieve system identifiers is just wrong. I'd understand it for XML 1.1, but why on earth in XML 1.0 ? Who needs or wants this, ignoring any commercial interests ? > I don't believe in non-DNS host names. Nor me in this discussion. In a draft about say the file: URI scheme I'd try to dance around the issue... ;-) I dare not propose to reject bug 5280, maybe it deserves a WONTFIX. For bug 5279 somebody feeling responsible for XML 1.0 should figure out what is wrong, your validator, the W3C validator, or the XML 1.0 spec. IMO XML 1.0 got it wrong, and the W3C validator got it right. > you seem to be hostile to the idea of fixing how your > documents are served I'd certainly prefer it if say "googlepages" serve KML files as application/vnd.google-earth.kml+xml, and where I have href links to such beasts I add an explicit type attribute. But note that I also have a "Kedit Macro Library" KML file on another server, it's not as simple as it sounds. Fixing servers is not under my control, and clearly servers have no chance to know what say KML is. That Google should support their own inventions is already very far stretched. Quick test, trying to get the sitemap.xml from googlepages: They say text/xml; charset=UTF-8, not too shabby. Now I recall where I saw space separated URIs: schemaLocation. > if you want to be in the business of creating test suites, > getting hosting where you can tweak the Content-Type is > generally a good way to start. Yeah, so far I got away without it for the purpose of W3C validator torture tests. This failed with your validator caused by chemical/x-pdb, which turned out to be a bug on the W3C validator server, so that's still "in the family". Maybe I should have insisted on IANA's server getting it right for the HTML i18n DTD. But after waiting a year for its registration, when it finally worked "as is" with the W3C validator, I wrote "thanks, and don't worry about it". After all it's a historic document type, only I missed it. > A validator can't know what parts you can edit and what > parts you can't. The most likely case is "can fix issues *within* the text", the more or less dubious meta data sent by HTTP servers is a different topic. Offer to check it optionally. Forcing users to click three options before the task at hand, find any bugs *within* the document, can start, is a nightmare. And "upgrade your Web hoster" is also a different topic, an ordinary user would not know that a W3C validator bugzilla exists, or how to use it, IMO bugzilla is also a nightmare. > However, if you care about practical stuff, you shouldn't > even enable external entity loading, since browsers don't > load external entities from the network. For xml2rfc this works to some degree, bad things can happen with invalid xml2rfc sources. When I'm using a validator I try to find bugs in documents, and when I wish to know what browsers do I use a browser. [image links within <pre> hidden by <span>] > Could you provide a URL to a demo page? <http://purl.net/xyzzy/xedit.htm> After setting all options to get around a "congruent de facto guidance" in an advocacy page overruling IETF and W3C standards from your POV it finds 29 invalid uses of <img> within <pre>. One of the advantages of schema validation, a <span> cannot confuse your validator. Frank
Received on Monday, 18 February 2008 21:50:48 UTC