W3C home > Mailing lists > Public > whatwg@whatwg.org > May 2009

[whatwg] Link rot is not dangerous

From: Toby Inkster <mail@tobyinkster.co.uk>
Date: Sat, 16 May 2009 14:55:12 +0100
Message-ID: <1242482113.18078.49.camel@ophelia2.g5n.co.uk>
Philip Taylor wrote:

> The source data is the list of common RDF namespace URIs at
> http://ebiquity.umbc.edu/resource/html/id/196/Most-common-RDF-namespaces
> from three years ago. Out of those 284:
>  * 56 are 404s. (Of those, 37 end with '#', so that URI itself really
> ought to exist. In the other cases, it'd be possible that only the
> prefix+suffix URIs are meant to exist. Some of the cases are just
> typos, but I'm not sure how many.)
>  * 2 are Forbidden. (Of those, 1 looks like a typo.)
>  * 2 are Bad Gateway.
>  * 22 could not connect to the server. (Of those, 2 weren't http://
> URIs, and 1 was a typo. The others represent 13 different domains.)

While this analysis is interesting, looking at the 56 which 404, it
doesn't seem like a massive loss to me. Some of them are clearly typos
(e.g. DOAP and RSS syndication which are both on HTTP 200 and HTTP 3xx
lists in their correct form). In many cases I think you'll find that
it's not that the link has "rotted" with time, but that there was
*never* a file at the other end.

Even the ones which are genuinely lost are probably only used by a
handful of people. The *really* commonly used URIs - RDF, RDFS, OWL,
FOAF, Dublin Core (1.1 and Terms), RSS (1.0, plus commonly used
modules), SKOS, SIOC, dbpedia, geo, Geonames, vCard and iCalendar - all
seem to have been pretty stable so far.

Judging the stability of RDF URIs by looking at the 284 most common
namespace URIs is akin to judging the provision of light rail in British
cities by looking at the UK's 284 most populated areas - the results
would actually be more helpful if you restricted yourself to a smaller
sample.

Lastly, the RDF model tends to be very resilient against loss of
information anyway. Generally, data tends to be structured such that if
a collection of triples is true, any subset is also true. So if the
meaning of certain triples within a document is lost because of link
rot, the document as a whole will probably still be useful.

-- 
Toby Inkster <mail at tobyinkster.co.uk>
Received on Saturday, 16 May 2009 06:55:12 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:49 UTC