Followup thoughts [Re: BioRDF [Telcon]]

Let me toss out a few ideas (possibly longish - sorry). These thoughts 
might appear somewhat like the famous Larson cartoon where Joe is 
speaking to his dog Bowser:
What Joe said: "Bowser, I'm going to toss a bone, then you go fetch the 
bone"
What Bowser heard: "Bowser!!!!bone!!!!!bone"

Mostly, this is what I heard, followed by some interpretations, comments 
and questions.

What I heard:
Important issues related to LSID
    Managing distributed objects (not just browsing)
    Decoupling identity from location (whatever "location" means)
    Versioning is important
       Not universally used
       Approaches that appear to be used:
          version numbers on same LSID string
          change LSID string
    Understand aspects of ARK that could apply to or augment LSID
       coupling to HTTP
       Comments about ARK:
          allows location independent identification
          web-accessible standard
          alternative to LSID
    Dereferencing (huge interest here)
       DDDS and other mechanisms (which I interpret to mean [1])
    Data/Metadata
       room for confusion here
       one comment: LSID doesn't link to metadata
       one comment: ARK puts metadata in URI
    Persistence
       Interesting comment (Sean?)
          Datastore organized in name graphs
          version numbers used to determine which name graph to view

I'll stop there from the scribe part: that's tough for me trying to keep 
accurate notes in a candy store; too many great ideas flying past and my 
brain is always busy trying to disambiguate, embrace and extend 
everything flying by.

I must ask this question regarding the nature of an LSID. Is it not an 
identifier? I've heard people use it as a "name". If our esteemed 
colleague, whom we identify with his email address, loses a finger, do 
we change his identity? Do we change our names when we change any 
attribute useful in identification of ourselves? Why would we ever 
change an LSID just because some aspect of the identity of the 
particular subject ever changed? Certainly we would change whatever 
metadata is involved in the records of our identity, but we don't change 
our identity simply because the subject itself did not change. Version 
numbers make a lot of sense. So does a "name graph" that relates an 
identifier to the properties that constitute the object, and version 
numbers that tie identifiers to the most-recent properties (attributes, 
sorry) makes sense. And, in the spirit of W3C documents, where the 
catalog points to the most-recent version but still gives links to prior 
versions, should that not be the standard way to deal with LSIDs, no 
matter how they come to be constituted?

I believe it was Carole, in one of her many informative opportunities to 
talk, that said words to this effect (one of here two use cases): "we 
use LSIDs in our database as we gather data from the web *if LSIDs are 
available*". Those are not her precise words, and I apologize if they 
are wrong to boot, and the emphasis is mine alone. I am very interested 
in that aspect: other researchers may or may not be assigning LSIDs to 
their data, but we must gain access to that data in any case. How do we 
do that?

Another aspect of versioning that came up was that of authority. If we 
step outside the context of this thread (outside the box, so to speak) 
and look around, this question comes up often just about everywhere. 
Software development comes to mind. The apache foundation has this 
notion of "committers". A commiter is one who has been given the 
authority to make changes in the version controlled source code for a 
project. Apache foundation runs on a meritocracy, where people are 
elected to committer status after showing appropriate skills, etc. Maybe 
that doesn't apply to life sciences research, but what could be imported 
from such ideas into LSID?

What I got:

I came away from the largest picture of the discussion that there is a 
need for means by which all forms of identifiers can be generated and 
used by all. As Carole said, they are used if they are available. 
Whether they start with something that makes sense in an HTTP 
environment may or may not be as important. As Dan suggested, if you can 
"rot13" (rotate the characters that make the string) and the identifier 
is still useful, then it shouldn't matter how the string is constructed; 
it does matter that the string is, at once, available, and findable. 
After all, an LSID, an ARK, a PSI (to the topic mappers) is a shortcut, 
a one string fits all (for those who know it) identifier of some object, 
concept, subject, whatever you want to call it. LSID happens to be the 
solution adopted by the life sciences community. However...

There is this subject that crept up in the previous couple of decades 
known as "psychoneuroimmunology." That's what you get when the 
psychologist start collaborating with neurologists who are also 
collaborating with immunologists. I feel comfortable in predicting that 
such collaborations will move in directions that include non-lifescience 
workers, who are not familiar with LSIDs and who will need to use them. 
That argues for separation of properties which identify objects from the 
shortcuts we invent to identify those objects. The properties 
(attributes, sorry) still prevail. If I lose a finger in an accident, 
I'm still me.

If we happened to agree that collecting object identity properties and 
associating those with identifier shortcuts in a way that render them 
searchable, say, using rdf triples on the web, then we are a step closer 
to allowing even google to help us identify our objects. This, I 
believe, is important to the larger picture of federating research 
efforts among heterogeneous work groups everywhere. Doing so means that 
we can then include identifiers on the web in numerous ways, making them 
ubiquitous, no matter how they are constructed. Note: by saying that, I 
am not advocating ad hoc fabrications; I strongly believe that projects 
like LSID, ARK, even PSIs, are important and warrant the efforts of 
standardization.  If one were to imagine a public information commons, 
one supported by several kinds of entities, including NLM, NSF, and even 
aspects of the philanthropic universe, then one could imagine federating 
all the working ontologies we use together with the identifiers 
associated with the objects represented in those ontologies. I tend to 
think that a federation of subject maps would suit global collaboration 
and encourage greater and more standardized use of identifiers such as 
LSID. In my view, a subject map, among other things, is a "name graph."

That's a bit more than a half EURO for the day.
Cheers,
Jack
[1] http://www.ietf.org/rfc/rfc3401.txt

Received on Monday, 31 July 2006 17:52:18 UTC