Re: Immunity of SW statements to changes in location. Was: Re: URL +1, LSID -1 from Balaji S. Srinivasan on 2007-07-16 (public-semweb-lifesci@w3.org from July 2007)

From: Balaji S. Srinivasan <balajis@stanford.edu>
Date: Sun, 15 Jul 2007 22:32:08 -0700
To: public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-Id: <36C575DF-2A55-4533-968A-C66A6D6FAB57@stanford.edu>

statements with the subject being
 > http://beta.uniprot.org/entry/P12345 and another set makes  
statements about http://uniprot.org/entry/P12345. They are really  
talking about the same subject, > but our semantic web agent won't  
know that. If we had used the PURL, then we wouldn't have a problem.

One solution is to have a "freshen_rdf" script that periodically goes  
through an RDF file or triplestore, does an HTTP GET on each unique  
URI, and updates the URI if it's been 301 redirected to a new  
location. People are probably going to end up doing this periodically  
anyway in order to validate each URI as pointing to a resolvable  
resource before doing anything nontrivial with the triplestore.

Now, a naive GET on every URI might take some time, but it could be  
made more efficient by first resolving the namespace declarations at  
the beginning of the RDF file. For each namespace, such as  
beta.uniprot.org, you do one GET to see whether any 301 redirects  
have been set up. Perhaps the cleanest way to do this is for the EBI  
people to have metadata at "http://beta.uniprot.org/uniprot/ 
redirect.rdf" (or a similar URI) which contains a set of triples with  
redirect information. This might be as simple as a rewriting regex.  
If it's just a regex, then you can apply it to quickly freshen all  
the URIs from this namespace without having to do HTTP GETS on each  
of them. Alternatively, that redirect.rdf file might contain a table  
of "sameAs" mappings which, again, can be used to freshen the URIs in  
your triplestore.

--
Balaji S. Srinivasan, Ph.D.
Stanford University
Lecturer, Depts. of Statistics and Computer Science
318 Campus Drive, Clark Center S251
(650) 380-0695
balajis@stanford.edu
http://jinome.stanford.edu

On Jul 15, 2007, at 9:34 PM, Alan Ruttenberg wrote:

>
> On Jul 15, 2007, at 1:53 PM, Eric Jain wrote:
>> Alan Ruttenberg wrote:
>>> The point of having the PURLs  is to ensure that there is a  
>>> mechanism for handling three cases that LSIDs were intended to  
>>> address (but which can be addressed without the trouble of  
>>> introducing a separate resolving mechanism)
>>> 1) To be immune from the "actual URL of the representation"  
>>> changing. (e.g. beta.uniprot.org goes out of beta)
>> 1) We'll do a 301 "permanent" redirection, promise.
>
> Yes, but how will we handle the case where some set of people make  
> statements with the subject being
> http://beta.uniprot.org/entry/P12345 and another set makes  
> statements about http://uniprot.org/entry/P12345. They are really  
> talking about the same subject, but our semantic web agent won't  
> know that. If we had used the PURL, then we wouldn't have a problem.
>
> Comments to your other points in separate email.
>
> -Alan
>
>
>
>

Received on Monday, 16 July 2007 14:04:55 UTC