Re: URIs / URLs from Aaron Swartz on 2001-04-11 (www-rdf-interest@w3.org from April 2001)

From: Aaron Swartz <aswartz@swartzfam.com>
Date: Wed, 11 Apr 2001 18:59:46 -0500
To: Lee Jonas <ljonas@acm.org>, RDF Interest <www-rdf-interest@w3.org>
Message-ID: <B6FA5A21.8F81%aswartz@swartzfam.com>
Lee Jonas <ljonas@acm.org> wrote:

> The notion of URLs identifying representations seems a little trite to me.
> It indicates the nature of the true problem, without fully addressing it: a
> resource at the end of a location is not consistent over time.
> 
> This is for at least two good reasons: resources evolve, and resources move
> / disappear / or worse, a second resource ousts the first at a particular
> location.

Resources do not evolve -- their representations (or network entities) do.

> The first issue could have been addressed more formally (and hence
> consistently) with a simple versioning scheme.

What about ETags?

> This would have alleviated
> the problem of instantly breaking third party links (or invalidating
> metadata semantics) when you change a resource.  Yes your links must change
> to reflect new versions of things you reference, but these changes could be
> a graceful migration, not an abrupt crash.

How do versions fix changes in resources? It seems they just break things
for the 94% (as previously cited) of links that actually work correctly.

> The second is the main bugbear of using a resource's location to identify
> it.  This phenomenon is well known in distributed object technology.
> Superior solutions leave the actual resolution of an object's location to
> some distributed service when a client wants to interact with it.

Again, URLs don't have to be used this way, but we do. You can try and redo
URLs (which are widely-used, implemented, understood, etc.) or you can fix
the other parts of the system (some of which can probably be upgraded with
little headache). In fact there are projects which are trying to do that:
World Free Web, PURL, Alexa/Internet Archive, Google caches, etc.

http://wfw.sourceforge.net/
http://purl.org/
http://archive.org/
http://www.alexa.com/company/technology.html
http://www.google.com/help/features.html#cached

I'll keep track of others at: http://logicerror.com/alternateURLResolution

> These are compounded with the fact that the resource can be one of many
> formats and there is no clear way to distinguish them from the URL iself.  A
> resource such as http://mydomain/mypic.png may safely be assumed to be a png
> graphic, but what about the resource at the end of http://mydomain/mydir/ ?

Resources don't usually have formats. That's why there's content
negotiation.

> Mime types have become pervasive for identifying a resource's type, yet URLs
> predate MIME by years.  If you want to know its type you have to make a
> request to some server process.

Is that true? Hmm, looks like it...

> 1) It may become more common to reason about abstract resources whose
> identifiers may not be readily representable as a location.  It would be
> better to identify these with a URN.  Hence URNs may be more widely used
> than at present.

Why should things without a "location" use a URN? They can still be
described can't they. Folks! Just because it's a URN doesn't mean it's
anything special. It still represents a resource, even if it's in the
Fooawackyak scheme.

> 3) Data quality will be poorer if it is hard for software to detect a
> resource change.  Transience is bad news if you are going to store facts
> about something that subsequently changes.

Yes, RDF does not deal with time very well, but this is, IMO, an RDF problem
not a URI one.

> What the solution to all this is I don't know.  I just can't help feeling
> that as the semantic web progresses things are about to get a lot more
> complicated unless these issues are addressed.

The Semantic Web is going to be complicated no matter what we do. ;-)

-- 
[ Aaron Swartz | me@aaronsw.com | http://www.aaronsw.com ]
Received on Wednesday, 11 April 2001 19:59:59 UTC