RE: URIs / URLs

Aaron Swartz [mailto:aswartz@swartzfam.com] wrote:

>Lee Jonas <ljonas@acm.org> wrote:
>
>> The notion of URLs identifying representations seems a little trite to
me.
>> It indicates the nature of the true problem, without fully addressing it:
a
>> resource at the end of a location is not consistent over time.
>> 
>> This is for at least two good reasons: resources evolve, and resources
move
>> / disappear / or worse, a second resource ousts the first at a particular
>> location.
>
>Resources do not evolve -- their representations (or network entities) do.

My interpretation is: resources are things that can evolve, representations
are distinct "snapshots" of a particular resource state, conceptually taken
at the point of access (this then includes representations of resources
provided by CGI scripts, etc).  A W3C Working Draft evolves, the html doc
retrieved from its "latest version" URL gets a representation of the latest
version of the Working Draft.

>
>> The first issue could have been addressed more formally (and hence
>> consistently) with a simple versioning scheme.
>
>What about ETags?

I am not familiar with these.  Can you give me some pointers?

>
>> This would have alleviated
>> the problem of instantly breaking third party links (or invalidating
>> metadata semantics) when you change a resource.  Yes your links must
change
>> to reflect new versions of things you reference, but these changes could
be
>> a graceful migration, not an abrupt crash.
>
>How do versions fix changes in resources? It seems they just break things
>for the 94% (as previously cited) of links that actually work correctly.
>

They don't fix changes in resources (and hence changes to their
representations), they make it less destructive for others to have links to
fragments in your documents, which you may subsequently change / delete.

Why would this break things for links that work correctly?

>> The second is the main bugbear of using a resource's location to identify
>> it.  This phenomenon is well known in distributed object technology.
>> Superior solutions leave the actual resolution of an object's location to
>> some distributed service when a client wants to interact with it.
>
>Again, URLs don't have to be used this way, but we do. You can try and redo
>URLs (which are widely-used, implemented, understood, etc.) or you can fix
>the other parts of the system (some of which can probably be upgraded with
>little headache). In fact there are projects which are trying to do that:
>World Free Web, PURL, Alexa/Internet Archive, Google caches, etc.
>
>http://wfw.sourceforge.net/
>http://purl.org/
>http://archive.org/
>http://www.alexa.com/company/technology.html
>http://www.google.com/help/features.html#cached
>
>I'll keep track of others at: http://logicerror.com/alternateURLResolution
>


I am not proposing any changes to URL.  This is more of an argument for
using URNs to identify resources (in a more abstract fashion), where
appropriate.  Then the mapping to a URL locating a specific representation
can be performed dynamically.


>> These are compounded with the fact that the resource can be one of many
>> formats and there is no clear way to distinguish them from the URL iself.
A
>> resource such as http://mydomain/mypic.png may safely be assumed to be a
png
>> graphic, but what about the resource at the end of http://mydomain/mydir/
?
>
>Resources don't usually have formats. That's why there's content
>negotiation.
>


Although it would sometimes be unavoidable, wouldn't it be nice to find out
the type of a representation without having to negotiate every time?


>> Mime types have become pervasive for identifying a resource's type, yet
URLs
>> predate MIME by years.  If you want to know its type you have to make a
>> request to some server process.
>
>Is that true? Hmm, looks like it...
>
>> 1) It may become more common to reason about abstract resources whose
>> identifiers may not be readily representable as a location.  It would be
>> better to identify these with a URN.  Hence URNs may be more widely used
>> than at present.
>
>Why should things without a "location" use a URN? They can still be
>described can't they. Folks! Just because it's a URN doesn't mean it's
>anything special. It still represents a resource, even if it's in the
>Fooawackyak scheme.
>

Reserving URLs to identify things that you can access representations of has
certain advantages.  Not least is keeping at least 94% of them vancable.  It
seems like a simple distinction to me.  In an ideal world, URLs are always
vancable, URNs may be so, but not necessarily.


>> 3) Data quality will be poorer if it is hard for software to detect a
>> resource change.  Transience is bad news if you are going to store facts
>> about something that subsequently changes.
>
>Yes, RDF does not deal with time very well, but this is, IMO, an RDF
problem
>not a URI one.
>

It is a fundamental aspect of the way URLs are defined to be used.  They
*locate* (note I did not say *identify*) representations (snapshots of
state) of underlying resources, not the resources themselves.  When
resources change, new representations may appear at the same and/or
different locations.  The only way RDF could satisfactorily deal with this
is if it described the resources directly by using URN identifiers, which
could be subsequently mapped to a URL locating an appropriate
representation.

>> What the solution to all this is I don't know.  I just can't help feeling
>> that as the semantic web progresses things are about to get a lot more
>> complicated unless these issues are addressed.
>
>The Semantic Web is going to be complicated no matter what we do. ;-)
>
>-- 
>[ Aaron Swartz | me@aaronsw.com | http://www.aaronsw.com ]

regards

Lee

Received on Thursday, 12 April 2001 09:07:59 UTC