- From: Terje Bless <link@pobox.com>
- Date: Thu, 5 Aug 2004 18:13:58 +0200
- To: Dominique Hazaël-Massieux <dom@w3.org>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, Dan Connolly <connolly@w3.org>, W3C Validator <www-validator@w3.org>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dominique Hazaël-Massieux <dom@w3.org> wrote: >>* Dan Connolly wrote: >>>I'm interested to know if others find the arguments in >>>http://www.w3.org/TR/webarch/#uri-benefits >>>persuasive or not; i.e. whether they agree with me that the markup >>>validation service should prefer URIs to FPIs. >> >>Two rather unrelated questions. The section you cite discusses good >>practise for "web agents" when providing resources, not whether XML >>processors should prefer system identifiers over public identifiers >>when resolving external entities. > >I think DanC's point was that since URIs are preferred to FPIs in the >Web Architecture, I think that's reading a bit much into the WebArch document — what little I read of it does not seem to support that statement — but I'll take your word for it. > the Markup Validator should always include them in the >validation step, the point being not that it should always dereference >them, but rather that, since (HTTP) URIs can be dereferenced, processed >by various tools, etc, "validating" their usage in the Doctype sounds >like a useful feedback to the document creators. > >Taking a practical example: - if the FPI and the System ID differs, it's >probably a good idea to tell the user - Yes, without question; which is why it's been on our feature «wishlist» ever since Karl brought it up a couple of years ago (IIRC). Unfortunately this is not easily implemented, so it's stayed — and will likely stay for the forseeable future — on the «wishlist»[0]. Patches gratefully accepted! But this is _not_ what I read Dan's suggestion to be. Even after several readings of what he wrote, my understanding of it is that he wants the SysID to be preferred over a PubID; regardless that both are present and whether they differ, and with no particular note of a desire for a user warning when they differ. As best I can tell this is just another effort to impose arbitrary preferences expressed in WebArch on the Markup Validator; since, at least as far as I was able to interpret Dan's message, his suggestion was not attempting to solve any actual technical problem. You'll note that the document cited as an example has been revised no less than three times since first referenced here, and the SysID has still not been corrected. It hardly seems as if this was a particularly pressing problem confounded by the current Markup Validator behaviour. >if the FPI and the System ID differs, it's probably a good idea to use the >System ID to check the document instead of the FPI, since that's what an >agent that wouldn't know the FPI would do > >What would be the drawbacks in terms of user experiences/implementations >against this approach? First of all, let me note again that I disagree vehemently that a URI is a superior identifier in general, and yet more so in the specific case of an entity reference. But leaving aside «WebArch» for a moment[1], lets go look at web history and implementation (instead of «Architecture»). The earlier specifications for HTML have used only a FPI in their examples, and some of the newer specifications have used both but with a bogus SysID. Also, in the SGML world the SysID is just a random blob of data — it could be "Blarghl!" and still be perfectly sensible, and Valid, SGML — while a (Formal) Public Identifier has actual structure, hierarchy, and registration procedures (which the W3C has ignored for a decade, but that's a different beef). The majority of pages out there will have an FPI, but only a subset will actually have a SysID; and the provenance of that SysID, iff included, is questionable. IOW our failure scenario here is an increase in pages that will provide bogus results from the Validator with that change. One of the reasons for this is one of pure human interaction; the FPI is actually legible to human beings, while an «URI» — the retrofitted assumption imposed on the opaque SysID — is only parseable with effort, and frequently misparsed even after a good faith effort. It is a pity that ISO8879 doesn't clearly specify the precedence between these[2] when both are present — as differing PUBLIC and SYSTEM references are clearly nonsense — but my distinct impression of their intent — with which I concur, obviously — is that the FPI is the preferred method in document instances intended for general districution. It would be insane to assign preference to an opaque string, with semantics only defined within what the IETF would term a «cooperating subset» (i.e. implementation dependant) of systems, over a globally unique, system agnostic, identfier with well established namespace management and registration procedures. XML's retrofitting of URIs on the SysID merely alleviates that, it doesn't contradict it. In any case, the Markup Validator currently prefers the FPI to a SysID — iff both are present — because this better matches actual published documents and causes zero problems. It has the secondary effects of keeping our catalog files somewhat slimmer without sacrificing caching (positive), and of not detecting the case when an author has provided an «incorrect» SysID for the FPI used when both are present (negative). If this part of WebArch, in its current state, gets more widely adopted in specifications, and deployed on the Web, then it's likely the Markup Validator would switch too to better reflect what is actually out there (despite my opinions on the advisability of that approach). But as it stands, I see no benefit to this change other than that of helping (self-)fulfill the WebArch prophecy — which might be considered a laudible goal in itself, of course — and that is simply not a persuasive argument for me. [0] — Actually, there is a small chance that some otherwise unrelated changes may enable us to bolt on a minimally useful and acceptable warning facility for this, but it'd be clunky and I'm not sure it's implementable yet. Did I mention that patches would be more than welcome? :-) [1] — I find the language in XML 1.0.3 that supports this position much more persuasive than WebArch in any case. [2] — But note that Goldfarb writes: «For obvious reasons, the use of the formal public identifier feature is highly recommended.» And in the footnote attached to that: «Formal public identifiers would probably have been mandatory, except they were introduced relatively late in the development cycle of ISO 8879.» - -- "Fly it until the last piece stops moving..." -----BEGIN PGP SIGNATURE----- Version: PGP SDK 3.0.3 iQA/AwUBQRJcxaPyPrIkdfXsEQJQnwCghSy65WXJ7gg7UYCj6JCW+1m0GJQAn32s VapmSyEBzcVeOQFUCPYAR1fi =F0zZ -----END PGP SIGNATURE-----
Received on Thursday, 5 August 2004 12:35:14 UTC