Re: URLs/LSID/RDF etc. from Sean Martin on 2004-04-22 (public-semweb-lifesci@w3.org from April 2004)

From: Sean Martin <sjmm@us.ibm.com>
Date: Thu, 22 Apr 2004 09:03:05 -0400
To: public-semweb-lifesci@w3.org
Cc: Greg Tyrelle <greg@tyrelle.net>
Message-ID: <OFA44A5434.D432E12F-ON85256E7E.0041A428-85256E7E.0047B1F4@us.ibm.com>
GT>Again, a life sciences identifier scheme that is not part of the web
GT>is less useful to me. But maybe to industry perhaps ?

I don't see it like that at all. To be really useful LSID's need to become 
part of the web too. I have reached the point where I see no difference as 
a user. We quite happily place LSID's in anchor links right into HTML (<A 
SRC="lsidres:urn:lsid:pdb.org:pdb:1AFT">Check out this protein!</A>) - 
once you install the LSID resolution protocol handler for IE you can "surf 
the data web" quite happily as a human - the main difference is that your 
software can too. See http://lsid.limnology.wisc.edu/ for an example of 
where Dan Smith implemented this and 
http://www-124.ibm.com/developerworks/oss/lsid/images/launchpad.gif is a 
picture of an application running in IE that uses the stack. Using the 
client side protocol stack is not any different to using "GET" or some 
other such utility you would use to download something via HTTP. It is 
good to be able to click on a link and know that the underlying data stack 
can be safely left to find you a copy of the object you seek, from 
potentially multiple sources/over multiple protocols.

Additionally, having a formal method for the accessing of meta-data allows 
for more sophisticated interaction with the provider of the data, for 
example for programmatic negotiation of format/semantic type or perhaps 
the automated discovery of the latest version of an object, given an 
earlier version (or any other related LSID) as a starting point. 

GT>If my "agent" is to add this hypothesis to it's KB, I might instruct
GT>it to find more information about the processes involved (assuming I
GT>don't already have this knowledge). If the GO terms and GIs were HTTP
GT>URIs I can dereference them to (hopefully) retrieve some useful
GT>information about those resources. However with LSID I must have the
GT>necessary infrastructure in place (resolvers, clients etc.).

We do exactly this in some of the systems we are currently building. 
Various authorities might well build up a lists of definitions for 
formats, semantic types etc and provide LSID's for them (e.g. 
urn:lsid:i3c.org:predicates:storedAs or urn:lsid:i3c.org:formats:pdb)that 
are shared widely - this greatly enhances interoperabilty btw. 
Dereferencing on one of these results in a web page which might list (in 
human readable) exactly what is meant by a concept, a specification of the 
format, or a list of links to software that can be used for processing 
that information. Additionally this information might be stored as 
meta-data for that LSID, allowing automated programmatic access to the 
same/additional data.

GT>You will not be able to "technically" insure two URIs are not pointing
GT>at the same object using LSID or HTTP URIs IMO. Also if LSIDs are
GT>going to be use to identify "concepts", what is to say that two
GT>authorities will have LSIDs for the concept p53 ? This is especially
GT>important considering their use in RDF to identify "resources".

The object may be in multiple places, but the LSID can be the same. If the 
name is the same, it is the same object - it does not matter where the 
various copies are stored. Of course it is better if two authorities use 
the _same_ LSID for p53 but if they do not, someone else may do it for 
them by creating a triple that says something like 
urn:lsid:pdb.org:pdb:p53  lsid:i3c.org:predicates:sameAs 
lsid:anotherauth.com:proteins:p53 
As you can see, third parties may quite happily use another authorities 
lsid to say something of their own about the data stored by our original 
two authorities for concept p53.


Kindest regards, Sean

--
Sean Martin
IBM Corp.
Received on Thursday, 22 April 2004 09:04:49 UTC