Re: URL +1, LSID -1 from Eric Jain on 2007-07-15 (public-semweb-lifesci@w3.org from July 2007)

From: Eric Jain <Eric.Jain@isb-sib.ch>
Date: Sun, 15 Jul 2007 19:53:13 +0200
To: Alan Ruttenberg <alanruttenberg@gmail.com>
CC: wangxiao@musc.edu, Michel_Dumontier <Michel_Dumontier@carleton.ca>, public-semweb-lifesci <public-semweb-lifesci@w3.org>, Mark Wilkinson <markw@illuminae.com>, Benjamin Good <goodb@interchange.ubc.ca>, Natalia Villanueva Rosales <naty.vr@gmail.com>
Message-ID: <469A5F09.1020004@isb-sib.ch>

Alan Ruttenberg wrote:
> The point of having the PURLs  is to ensure that there is a mechanism 
> for handling three cases that LSIDs were intended to address (but which 
> can be addressed without the trouble of introducing a separate resolving 
> mechanism)
> 1) To be immune from the "actual URL of the representation" changing. 
> (e.g. beta.uniprot.org goes out of beta)
> 2) To enable switching to a backup if the server is turned off, or 
> certain pages go 404
> 3) To facilitate local caching of content from servers such as uniprot 
> in such a way as to not adjust what URLs clients need to use to access 
> this content.
 >
> Many of us who have worked in the field have seen (and been burned by) 
> variants of these cases over the years.

1) We'll do a 301 "permanent" redirection, promise.
2) We have two servers, one in the U.S., one in Europe.
3) Does anyone cache temporary redirects?

Now you might argue that even if we do all this, other databases might not. 
Perhaps it is an admirable goal to set up some infrastructure that makes 
access to life sciences resources more reliable, but given our inability to 
agree on a simple resolver, isn't this perhaps putting the goals a bit too 
high? The more behavior there is in the resolver, the bigger the scope for 
disagreements, and the smaller the chance that it will be used widely!

Also: Perhaps the switch from beta.uniprot.org to uniprot.org is not the 
best example, as in this case the pages won't change much (or at least not 
more than from any release to another). But imagine you want to make some 
statements about http://www.ebi.uniprot.org/entry/P12345, one of the 
current pages. If you note down the time you accessed this page, you may 
still be able to retrieve the representation you described in future, e.g. 
courtesy of the Wayback Machine. But if all you have is a PURL, bad luck!

> Yes, but what sorts of statements can be made using 
> http://purl.uniprot.org/uniprot/P12345 as the subject? Because it can 
> mean any of the below, even the protein class itself, how can a 
> *semantic web* statement be made using it?

http://purl.uniprot.org/uniprot/P12345 is meant to be used for anything 
that isn't tied to a specific representation, hoped that would be clear?

So I guess it's similar to http://purl.org/commons/record/uniprotkb/P12345, 
except that it redirects to the default representation (i.e. HTML, or RDF, 
if Accepted) instead of presenting a list of formats that might (or might 
not) be available (if the resource does in fact exist).

> IMO, the first goal of our design ought to be to ensure that automated 
> semantic web agents (idiots as they will be) will have a fighting chance 
> to avoid having to do the difficult (even impossible) sorts of 
> disambiguations that people are faced with all the time. That bar hasn't 
> yet been met. Once we've ensured that we can meet that goal, then we can 
> talk about optimization. (incidentally we do discuss various 
> optimization techniques, from predicability of the form of the name, to 
> purl servers sending back the rewrite rules they use so that they can be 
> implemented on the client side).

There are people doing Semantic Web crawlers now, for them, I gather, being 
able to get the RDF representation directly isn't a premature optimization!

> OK, I missed that. But I'd still use the same purls in the HTML. There 
> are other mechanisms for indicating the real destination, and it may 
> lead to confusion when people need to choose a name to the subject or 
> object of a statement. If you ever land up using RDF/a, you will need to 
> use the same URIs as in the RDF.

Might do that, all you need to convince me is a useful application that 
outweighs the inconvenience the extra redirection causes for our users!

> So that you know, there is interest by the purl developers to extend 
> their redirect service to better accommodate semantic web usage, and 
> they have offered to do this if we can get together and tell them what 
> we need.

Proper templates, and content negotiation (though latter seems to be 
controversial). See also http://zepheira.com/news/#x20070711a, I guess.

Received on Sunday, 15 July 2007 17:54:15 UTC