Handles and PURLs from Butler, Mark on 2003-05-22 (www-rdf-dspace@w3.org from May 2003)

From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
Date: Thu, 22 May 2003 11:38:45 +0100
To: " (www-rdf-dspace@w3.org)" <www-rdf-dspace@w3.org>
Message-ID: <5E13A1874524D411A876006008CD059F066A1CDD@0-mail-1.hpl.hp.com>
Hi team,

There has been a bit of discussion going on internally in HP about whether
to use Handles in the history system. I'm hoping this discussion is going to
forwarded to this list, because I'm sure this is something the rest of the
team will have an interest in. Also Eric Miller has stated that he would
prefer discussions are sent to an archived email list and I think this is
good advice.  

People may be familiar with it already, but I found "A competitive
evaluation of Handles and PURLs" by Larry Stone useful:
http://web.mit.edu/handle/www/purl-eval.html

In essence the argument has been about whether Handles, because they do not
support HTTP GET, are compliant with the semantic web architecture. 

My position on this is that we really shouldn't worry about "compliance" in
this way. I have a nice quote about this in my cube: "Dogmatic attachment to
the supposed merits of a particular structure hinders the search for an
appropriate structure". 

Also one thing I've been interested in for a while is whether a fundamental
rethink about the way we use URIs can enhance the web architecture. A lot of
the current discussions about web architecture, and the semantic web for
that matter, are constrained by backward compatibility issues. However with
any IT system, we often have to make decisions about when it is worth
perserving the architecture we already have and when we need to sacrifice
backward compatibility in order to move to a completely new architecture
because it has compelling advantages. I have a name for this - "Web Version
2.0" - and a mission statement i.e. "We've got a bunch of technologies that
form the current web and we've learnt a lot creating those technologies. If
we could start again from a blank sheet of paper, unconcerned about backward
compatibility, what would we do differently, what could we simplify, and
where would it take us". I think this is quite an interesting thought
experiment, and I note that conducting thought experiments like this are a
cornerstone of the extreme programming methodology. 

It also seems to me that Handles are attempting to do something like this,
but we can easily postulate other approaches. I think there are a number of
issues here, and there has been quite a bit of discussion about this,
particular within the W3C-TAG, but I haven't seen a document that gives an
adequate summary of all the issues, so essentially:

1. URLs are a form of URIs.

2. URLs are used by people to locate things. Therefore they should be
optimized to be user friendly e.g.
   http://www.hp.com/ is good
 
http://www.somenewssite.com/news/lots/of/directory/structure/?somequery=fred
&anotherquery=flintstone is bad

3. URIs are used to identify resources. Due to the "cool URIs don't change"
principle, once resources are created they are immutable.

4. There is a tension between 2 and 3. For example the contents of a site
may change, but I still want a user-friendly short-cut to a site as well as
a perma-link. It feels like we need some level of dereferencing or
indirection here, i.e. typing in http://www.hp.com/ takes us to a particular
version of the HP website and the browser then informs the user of a
permalink which we can use to retrieve that particular version in the future
if we need. 

5. Due to 3, URIs tend to mix identity and version (i.e. date, time). There
are some disadvantages to mixing these two different axes, particularly as
different URIs mix them in different ways so they are not algorithmically
separable. Perhaps it might be useful to separate these axes, as then it
would be possible to determine from the URIs alone that two resources are
versions of the same thing. Now this is controversial, as we've already
discussed an opposing view e.g. identifiers must be random. But from the
CC/PP work, I'm concious things are much easier for processor developers as
this may be easier than keeping track of a bunch of metadata that says all
these identifiers refer to versions of the same resource. For more details
see
http://www.hpl.hp.com/techreports/2003/HPL-2003-31.html

6. The concept behind PURLs and Handles is good, i.e. when a resource moves
you don't need to worry about it. DNS already has a level of indirection
built in, so why not do this for retrievable resources? This is discussed in
the Stone paper cited above.

7. Although the "cool URIs don't change" advice seems good, as Cory
Doctorow's Metacrap paper points out web techniques have to exist in world
where people are subject to social, political and economic pressures.
Companies in particular want to be able to control what information they
disseminate at a particular time, and they reserve the right to try to
remove or obscure information from the public domain, so it is very rare to
see companies follow the "cool URIs don't change" advice. Therefore my
position is I would like URIs to give some indication about whether they
refer to a retrievable resource and if they are likely to be permanent or
not. This is similar to my position on the relationships between namespaces
and schemas or RDDL documents - I would like them to indicate the same
information. This information allows processors or search engines to deal
with these links in a more intelligent way.

Comments?

br,

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Thursday, 22 May 2003 06:39:05 UTC