Re: broken links in W3C documents and recommendations

On Wed, 2009-02-11 at 16:00 -0800, Larry Masinter wrote:
> With respect to ACTION-222:  
> 
> Here is my proposed note to W3C Staff as an operational policy for the
> W3C web site, and, in particular, for maintenance of W3C publications.
>
> =========================
> 
> Subject: Dealing with broken links in W3C publications 
> 
> The W3C recommends a practice where “cool URIs don’t change””:
> http://www.w3.org/Provider/Style/URI
>
> However, in some cases, unfortunately,  links *do* change. For
> example, the TAG Note:
> 
> http://www.w3.org/2001/tag/2002/01-uriMediaType-9
> 
> contains two links which no longer point to the documents intended:
> 
> http://www.ietf.org/internet-drafts/draft-eastlake-cturi-03.txt and
> http://www.ietf.org/internet-drafts/draft-mealling-iana-urn-02.txt
> 
>  
> 
> In fact, this disappearance of documents at those URIs was not due to
> a clerical error on IETF’s webmaster’s part: it is IETF policy
> currently to remove documents which have expired from the official
> “Internet-drafts” repository.

Do you have any idea why they use 404 "not found" rather than 410 "gone"
when they intentionally take documents "out of print"?
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.11


> I think the response should be two-fold:
> 
>  
> 
> a)     When publishing a document as a Note, Working Draft or any
> other permanent W3C publication,  the criteria for publication should
> examine any hyperlinks in the document and attempt to assure (from
> author or editor assertion or some other means) that there is a
> reasonable commitment that the referenced document will be available
> indefinitely. This policy might have prevented the current situation.

FWIW, the most relevant current policy I can think of is:

"The document MUST NOT have any broken internal links or broken links to
other resources at w3.org. The document SHOULD NOT have any other broken
links."
 -- section 7. Document Body of Technical Report Publication Policy
(Pubrules)
http://www.w3.org/2005/07/pubrules?uimode=filter&uri=#document-body

That policy applies to Notes and Working Drafts, where the W3C Webmaster
enforces rules about what gets published by way of automated tools.

Note that the case in point, the uriMediaType-9 finding, isn't
a W3C Technical Report but just a TAG finding. The TAG chooses
what sorts of norms and constraints to establish for findings.

For links form Technical Reports to sites outside of w3.org,
we leave it to the review community to judge whether the
target of the link is sufficiently persistent.

That seems to be working in this case: a reviewer reported
the broken link, so now we can deal with it. I don't see
any need for new policies.


> b)     In cases where current W3C permanent publications contain links
> that are broken (discovered either automatically or noted and reported
> by an individual), I suggest the W3C create a permanent “reference”
> page for the now-broken hyperlink, add to the “reference” page some
> possible alternative sources of the same document, and change the
> hyperlink in the W3C document to point to the “reference” page.

We do have a policy for revising tech reports after-the-fact
for broken links:

"The only modifications allowed in place are:

     1. Fixes to broken markup (e.g., invalid markup)
     2. Fixed to broken links (i.e., URIs)
     3. Fixes to broken style sheets"
 In-place modification of W3C Technical Reports
 http://www.w3.org/2003/01/republishing/

I have never used it; I haven't seen a case where it
was cost-effective; a 404 message is pretty
self-explanatory, no? It's just like a citation to a book
that is no longer in print. You contact the publisher and
they say "no, we don't have any more copies for sale."
The citation still makes sense
(provided it has the customary redundant info: title, date,
author, etc), though it's considerably less helpful to
somebody that doesn't already have a copy of the cited work.

I suppose sometimes the publisher goes away altogether
and gets replaced by something really misleading, and
for those cases, it would make sense.

 
> 
> For example, one might create a web page:
> 
>  
> 
> 
> http://www.w3.org/2009/broken-links/www.ietf.org/internet-drafts/draft-eastlake-cturi-03.txt.html
> 
>  
> 
> which could contain:
> 
>  
> 
>                    A W3C document originally contained a pointer to
> 
> 
> http://www.ietf.org/internet-drafts/draft-eastlake-cturi-03.txt
> 
>                 That document is no longer available, but an alternate
> source for that document can be found at
> 
> 
> http://tools.ietf.org/html/draft-eastlake-cturi-03

i.e. the IETF doesn't know how use 410 gone so W3C would do it for them?

I can imagine extreme cases where that's worthwhile, but this
current case isn't one of them.

> The goal is to establish a general way of dealing with “broken links”
> by replacing them with “cool” URIs maintained under W3C control.
> 
>  
> 
> Larry

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
gpg D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Received on Thursday, 12 February 2009 16:21:58 UTC