Re: UniProt RDF via HTTP from Alan Ruttenberg on 2007-05-12 (public-semweb-lifesci@w3.org from May 2007)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Sat, 12 May 2007 08:07:40 -0600
To: Eric Jain <Eric.Jain@isb-sib.ch>
Cc: public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-Id: <E1793E55-FB2B-405D-8545-7B6E88D2B7F1@gmail.com>
On May 12, 2007, at 4:49 AM, Eric Jain wrote:

> Alan Ruttenberg wrote:
>> Would it be possible to add a service so that I can get from the  
>> lsid directly to rdf and xml versions at least? Would it be  
>> correct to assume that all lsids in uniprot have such versions?
>
> The only common format in UniProt is RDF (e.g. there is no XML  
> representation of the taxonomy data).
>
> <http://beta.uniprot.org/uniprot/P12345> could return different  
> formats based on the "Accept" header, however this would complicate  
> caching...

I don't like content negotiation either.

> Another option (which would also allow you to link to rather than  
> retrieve a specific representation) would be an optional "format"  
> parameter.

This is a reasonable choice. However I think that there ought to be a  
link based on the identifier. Ideally the identifier itself would be  
resolvable.
Not sure what you mean by being able to link rather than retrieve.

>> Are the LSIDs supposed to be able to be resolved by an lsid  
>> resolver? If so is there one that ebi runs that I could play with?
>
> None that I'm aware of, and I'm afraid setting up a "correct"  
> resolver that behaves as required by the specs would be difficult,  
> if not impossible :-(
>
> The question is, what is worse crime against humanity: Misusing an  
> existing scheme, or inventing your own :-)

My gut is that the the former is worse. We want to present people  
with predictable systems. With the  misuse of an existing system,   
people's expectations that they can go to the specification and  
figure out what to do turns out not to be the case, and this reduces  
their confidence in both the specification and in the provider who  
misuses the spec.

>> I might suggest the following:
>> http://beta.uniprot.org/uniprot/what/ 
>> urn:lsid:uniprot.org:uniprot:P12345
>> return some rdf that lists the specific formats that resource is  
>> available in, and urls where they can be fetched from?
>> Or if you have some simple rules for forming the URLs, could you  
>> share those?
>
> The simple rule is to append .ext to the URL of the resource, where  
> "ext" is rdf|xml|fasta|txt|...

If the URL of the resource was based on the LSID then that would be a  
reasonable solution. It's rather complicated  to have to fetch a page  
using the LSID, figure out what it redirected to, and only then do  
the rewrite. It also doesn't always work: e.g.

http://beta.uniprot.org/? 
query=urn:lsid:uniprot.org:core:Citation_Statement
goes to
http://dev.isb-sib.ch/projects/uniprot-rdf/owl/citation_statement.html
but
http://dev.isb-sib.ch/projects/uniprot-rdf/owl/citation_statement.rdf  
doesn't yield rdf.

BTW, I just tried the following and it gives an error.

http://beta.uniprot.org/? 
query=urn:lsid:uniprot.org:annotation:PRO_0000123886

>> Do you assign LSIDs to those resources too? If so is there a way  
>> to figure out which are "yours" and which are "theirs"?
>
> One of the main reasons for using LSIDs was that I need proper URIs  
> for all the resources we reference, and most resources have two  
> meter long, frequently changing cgi-bin URLs (OK, I'm exaggerating,  
> but not much).

This is a nasty problem which any solution should deal with. I think  
there is a better way, however, than having URIs that one can't  
reliably resolve.

Here is the solution we've been prototyping for the HCLS demo: http:// 
sw.neurocommons.org/2007/uri-explanation.html

We've got the basic framework up and are working to fill in all the  
redirections. We also need to do a little more work to have the  
"abstract records" return rdf metadata explaining where each of the  
concrete instances live.  I was asking the question so that we could  
add Uniprot to the prototype. Already we have, e.g. http://purl.org/ 
commons/record/uniprotkb/P12345 which would be a reasonable  
alternative to using urn:lsid:uniprot.org:uniprot:P12345

With this scheme, we are able to have unambiguous URLs that all  
resolve to the resources they are intended to refer to (or via a 303  
explain why they can't).
Because purl.org is a redirect service, we can adjust the urls that  
are redirected to if the underlying database changes location. A  
suggestion by Mark Wilkinson was that we also make available the  
rewrite rules as rdf so that agents that want to avoid the  
redirection know how to do the rewrites in their application.

The administration of the redirections would be set up so as to be  
under the control of the community. Science Commons  volunteers to do  
the initial grunt work and ongoing administration, but the idea would  
be to set up some organization so that access is available to  
responsible members of the community so that we don't get into a  
situation where we are dependent on any individual organization.

If you have some time, perhaps we could talk off line about this...

Regards,
Alan


>
> Moreover, what is "ours" and what is "theirs" isn't always clear  
> (again, consider the taxonomy data, which is basically the NCBI  
> taxonomy), though in general if it resolves to one of our servers,  
> then it's probably ours.
>
Received on Saturday, 12 May 2007 14:05:48 UTC