- From: Dan Connolly <connolly@w3.org>
- Date: Thu, 12 Feb 2009 10:21:45 -0600
- To: Larry Masinter <masinter@adobe.com>
- Cc: "www-tag@w3.org WG" <www-tag@w3.org>
On Wed, 2009-02-11 at 16:00 -0800, Larry Masinter wrote: > With respect to ACTION-222: > > Here is my proposed note to W3C Staff as an operational policy for the > W3C web site, and, in particular, for maintenance of W3C publications. > > ========================= > > Subject: Dealing with broken links in W3C publications > > The W3C recommends a practice where “cool URIs don’t change””: > http://www.w3.org/Provider/Style/URI > > However, in some cases, unfortunately, links *do* change. For > example, the TAG Note: > > http://www.w3.org/2001/tag/2002/01-uriMediaType-9 > > contains two links which no longer point to the documents intended: > > http://www.ietf.org/internet-drafts/draft-eastlake-cturi-03.txt and > http://www.ietf.org/internet-drafts/draft-mealling-iana-urn-02.txt > > > > In fact, this disappearance of documents at those URIs was not due to > a clerical error on IETF’s webmaster’s part: it is IETF policy > currently to remove documents which have expired from the official > “Internet-drafts” repository. Do you have any idea why they use 404 "not found" rather than 410 "gone" when they intentionally take documents "out of print"? http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.11 > I think the response should be two-fold: > > > > a) When publishing a document as a Note, Working Draft or any > other permanent W3C publication, the criteria for publication should > examine any hyperlinks in the document and attempt to assure (from > author or editor assertion or some other means) that there is a > reasonable commitment that the referenced document will be available > indefinitely. This policy might have prevented the current situation. FWIW, the most relevant current policy I can think of is: "The document MUST NOT have any broken internal links or broken links to other resources at w3.org. The document SHOULD NOT have any other broken links." -- section 7. Document Body of Technical Report Publication Policy (Pubrules) http://www.w3.org/2005/07/pubrules?uimode=filter&uri=#document-body That policy applies to Notes and Working Drafts, where the W3C Webmaster enforces rules about what gets published by way of automated tools. Note that the case in point, the uriMediaType-9 finding, isn't a W3C Technical Report but just a TAG finding. The TAG chooses what sorts of norms and constraints to establish for findings. For links form Technical Reports to sites outside of w3.org, we leave it to the review community to judge whether the target of the link is sufficiently persistent. That seems to be working in this case: a reviewer reported the broken link, so now we can deal with it. I don't see any need for new policies. > b) In cases where current W3C permanent publications contain links > that are broken (discovered either automatically or noted and reported > by an individual), I suggest the W3C create a permanent “reference” > page for the now-broken hyperlink, add to the “reference” page some > possible alternative sources of the same document, and change the > hyperlink in the W3C document to point to the “reference” page. We do have a policy for revising tech reports after-the-fact for broken links: "The only modifications allowed in place are: 1. Fixes to broken markup (e.g., invalid markup) 2. Fixed to broken links (i.e., URIs) 3. Fixes to broken style sheets" In-place modification of W3C Technical Reports http://www.w3.org/2003/01/republishing/ I have never used it; I haven't seen a case where it was cost-effective; a 404 message is pretty self-explanatory, no? It's just like a citation to a book that is no longer in print. You contact the publisher and they say "no, we don't have any more copies for sale." The citation still makes sense (provided it has the customary redundant info: title, date, author, etc), though it's considerably less helpful to somebody that doesn't already have a copy of the cited work. I suppose sometimes the publisher goes away altogether and gets replaced by something really misleading, and for those cases, it would make sense. > > For example, one might create a web page: > > > > > http://www.w3.org/2009/broken-links/www.ietf.org/internet-drafts/draft-eastlake-cturi-03.txt.html > > > > which could contain: > > > > A W3C document originally contained a pointer to > > > http://www.ietf.org/internet-drafts/draft-eastlake-cturi-03.txt > > That document is no longer available, but an alternate > source for that document can be found at > > > http://tools.ietf.org/html/draft-eastlake-cturi-03 i.e. the IETF doesn't know how use 410 gone so W3C would do it for them? I can imagine extreme cases where that's worthwhile, but this current case isn't one of them. > The goal is to establish a general way of dealing with “broken links” > by replacing them with “cool” URIs maintained under W3C control. > > > > Larry -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ gpg D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E
Received on Thursday, 12 February 2009 16:21:58 UTC