Re: UniProt RDF via HTTP from Eric Jain on 2007-05-14 (public-semweb-lifesci@w3.org from May 2007)

From: Eric Jain <Eric.Jain@isb-sib.ch>
Date: Mon, 14 May 2007 07:26:09 +0200
To: Alan Ruttenberg <alanruttenberg@gmail.com>
CC: public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-ID: <4647F2F1.4050308@isb-sib.ch>
Alan Ruttenberg wrote:
> This is a reasonable choice. However I think that there ought to be a 
> link based on the identifier. Ideally the identifier itself would be 
> resolvable.

The idea here was that the LSID is be a more abstract identifier for the 
"Thing", whereas the URL(s) identify specific representations (at specific 
locations). Of course you could also have both levels represented by URLs.

For practical reasons I like to have (non-HTML) URLs end with their file 
extension, this avoids some confusion when people try to save the data.


> Not sure what you mean by being able to link rather than retrieve.

If you want to link, all information must fit into a URL, i.e. I can't 
require you to set HTTP headers etc. If you are writing some code to 
retrieve resources, on the other hand, that shouldn't be a problem.


> If the URL of the resource was based on the LSID then that would be a 
> reasonable solution. It's rather complicated  to have to fetch a page 
> using the LSID, figure out what it redirected to, and only then do the 
> rewrite. It also doesn't always work: e.g.
> 
> http://beta.uniprot.org/?query=urn:lsid:uniprot.org:core:Citation_Statement
> goes to
> http://dev.isb-sib.ch/projects/uniprot-rdf/owl/citation_statement.html
> but
> http://dev.isb-sib.ch/projects/uniprot-rdf/owl/citation_statement.rdf 
> doesn't yield rdf.

Good point, should set up something that extracts this data from the 
"core.owl" file.


> BTW, I just tried the following and it gives an error.
> 
> http://beta.uniprot.org/?query=urn:lsid:uniprot.org:annotation:PRO_0000123886 

This is one of the few kinds of "second-level" identifiers (i.e. the data 
is part of another resource, in this case P12345). Plan to add support for 
this, but I'm afraid this is not at the very top of the list of priorities.


> Here is the solution we've been prototyping for the HCLS demo: 
> http://sw.neurocommons.org/2007/uri-explanation.html
> 
> We've got the basic framework up and are working to fill in all the 
> redirections. We also need to do a little more work to have the 
> "abstract records" return rdf metadata explaining where each of the 
> concrete instances live.  I was asking the question so that we could add 
> Uniprot to the prototype. Already we have, e.g. 
> http://purl.org/commons/record/uniprotkb/P12345 which would be a 
> reasonable alternative to using urn:lsid:uniprot.org:uniprot:P12345

One small problem I see with this is that it doesn't tell you in advance 
what is available, e.g.

   <http://purl.org/commons/record/uniprotkb/foo>

...will also return 303 rather than (perhaps) 404 (and when you follow the 
link it does a full-text search for "foo"?!), and:

   <http://purl.org/commons/record/uniprotkb/P00001>

...lists various formats even though this (obsolete) entry now only has an 
RDF representation.

Here is an URL to check where you can check what formats are available for 
an entry:

   <http://beta.uniprot.org/uniprot/P05067.*>

Should perhaps set the response code to 303 for this page (need to check 
first that no browser misinterprets this :-), and could make this the 
default target for the built-in resolver. However, it's unlikely that our 
users would prefer to get such an "empty" page rather than the Web view 
(which contains links to the different formats)...


> Because purl.org is a redirect service, we can adjust the urls that are 
> redirected to if the underlying database changes location. A suggestion 
> by Mark Wilkinson was that we also make available the rewrite rules as 
> rdf so that agents that want to avoid the redirection know how to do the 
> rewrites in their application.

That's a great idea, especially if you need to resolve tons of identifiers 
(e.g. for validation, common task).


> The administration of the redirections would be set up so as to be under 
> the control of the community. Science Commons  volunteers to do the 
> initial grunt work and ongoing administration, but the idea would be to 
> set up some organization so that access is available to responsible 
> members of the community so that we don't get into a situation where we 
> are dependent on any individual organization.

Sounds like a good idea -- if we can all agree on how this should work ;-)
Received on Monday, 14 May 2007 05:28:22 UTC