Re: URLs/LSID/RDF etc. from Phillip Lord on 2004-04-22 (public-semweb-lifesci@w3.org from April 2004)

From: Phillip Lord <p.lord@russet.org.uk>
Date: 22 Apr 2004 10:36:09 +0100
To: public-semweb-lifesci@w3.org
Message-ID: <vfn054fj7a.fsf@rpc71.cs.man.ac.uk>
>>>>> "Greg" == Greg Tyrelle <greg@tyrelle.net> writes:


  Greg>   |
  Greg>   |	'/wormbase/das/elegans/features?segment=CHROMOSOME_I:1000,2000' 
  Greg>   | 
  Greg>   |
  Greg>   |	This leads the programmer and biologist to certain conclusions 
  Greg>   |about query semantics 

  Greg> Both LSIDs and URLs are URIs, in which case they are intended
  Greg> to be opaque identifiers. You are not meant to infer anything
  Greg> about the resource from the URI ? I believe this is a case of
  Greg> using URIs incorrectly [1], not that HTTP URIs are broken.


It's applying extra semantics to the URL which it is not supposed to
have. This is not that URI's are being used incorrectly, it's just
that as specified they do not do all that they need to. 



  Greg> If my "agent" is to add this hypothesis to it's KB, I might
  Greg> instruct it to find more information about the processes
  Greg> involved (assuming I don't already have this knowledge). If
  Greg> the GO terms and GIs were HTTP URIs I can dereference them to
  Greg> (hopefully) retrieve some useful information about those
  Greg> resources. However with LSID I must have the necessary
  Greg> infrastructure in place (resolvers, clients etc.).


You need infrastructure to resolve a HTTP URI as well. The question is
whether the extra infrastructure you need for a LSID is worth the
effort. 


  Greg> I have ignored the issue of retrieving the "object" vs. a
  Greg> description of the "object" in this case. 

I think LSID's address this well. 

  Greg>   |By encoding things with URI's we do not guard against the
  Greg>   fact that the |underlying data may change.

  Greg> Why do we need to guard against the underlying data changing ?

This is one of the main issues in the use of identifiers in
bioinformatics. They identify two things: a specific data set, such as
a sequence; and the biological entity. The former is less stable than
the latter. 

A lot of databases have identifiers, which do not change (very
frequently) and accessions (which change when the data is updated). 

If you can't cope with versions then you have to keep every past
version of the database around or URL's will become out of date. Or
you apply some new semantics. 

This has happened on the web. "http://news.bbc.co.uk" does not
identify a specific document, but the current version of a specific
document. It changes every hour or so. 


  Greg>   | This leads me to a question about "persistent" URI's and
  Greg>   URL's |(PURLS's): How do you ensure that two URI's are
  Greg>   pointing at the same |object (bytes)? If we can collectively
  Greg>   answer this question we can |encode an LSID any way we
  Greg>   please as long as we keep in mind that this |information
  Greg>   must persist as long as a journal or other well vetted
  Greg>   |scientific medium.

  Greg> You will not be able to "technically" insure two URIs are not
  Greg> pointing at the same object using LSID or HTTP URIs IMO. Also
  Greg> if LSIDs are going to be use to identify "concepts", what is
  Greg> to say that two authorities will have LSIDs for the concept
  Greg> p53 ? This is especially important considering their use in
  Greg> RDF to identify "resources".

  Greg> Encoding LSID as a HTTP URIs seems to be a way forward. Maybe
  Greg> some kind of mapping:

  Greg> URN:LSID:example.com:12345:1

  Greg> http://example.com.lsid.org/12345/1

But surely, this URL wouldn't be expected to resolve? 

Phil
Received on Thursday, 22 April 2004 07:15:31 UTC