- From: Jack Park <jack.park@sri.com>
- Date: Mon, 31 Jul 2006 10:52:04 -0700
- To: "'public-semweb-lifesci'" <public-semweb-lifesci@w3.org>
Let me toss out a few ideas (possibly longish - sorry). These thoughts
might appear somewhat like the famous Larson cartoon where Joe is
speaking to his dog Bowser:
What Joe said: "Bowser, I'm going to toss a bone, then you go fetch the
bone"
What Bowser heard: "Bowser!!!!bone!!!!!bone"
Mostly, this is what I heard, followed by some interpretations, comments
and questions.
What I heard:
Important issues related to LSID
Managing distributed objects (not just browsing)
Decoupling identity from location (whatever "location" means)
Versioning is important
Not universally used
Approaches that appear to be used:
version numbers on same LSID string
change LSID string
Understand aspects of ARK that could apply to or augment LSID
coupling to HTTP
Comments about ARK:
allows location independent identification
web-accessible standard
alternative to LSID
Dereferencing (huge interest here)
DDDS and other mechanisms (which I interpret to mean [1])
Data/Metadata
room for confusion here
one comment: LSID doesn't link to metadata
one comment: ARK puts metadata in URI
Persistence
Interesting comment (Sean?)
Datastore organized in name graphs
version numbers used to determine which name graph to view
I'll stop there from the scribe part: that's tough for me trying to keep
accurate notes in a candy store; too many great ideas flying past and my
brain is always busy trying to disambiguate, embrace and extend
everything flying by.
I must ask this question regarding the nature of an LSID. Is it not an
identifier? I've heard people use it as a "name". If our esteemed
colleague, whom we identify with his email address, loses a finger, do
we change his identity? Do we change our names when we change any
attribute useful in identification of ourselves? Why would we ever
change an LSID just because some aspect of the identity of the
particular subject ever changed? Certainly we would change whatever
metadata is involved in the records of our identity, but we don't change
our identity simply because the subject itself did not change. Version
numbers make a lot of sense. So does a "name graph" that relates an
identifier to the properties that constitute the object, and version
numbers that tie identifiers to the most-recent properties (attributes,
sorry) makes sense. And, in the spirit of W3C documents, where the
catalog points to the most-recent version but still gives links to prior
versions, should that not be the standard way to deal with LSIDs, no
matter how they come to be constituted?
I believe it was Carole, in one of her many informative opportunities to
talk, that said words to this effect (one of here two use cases): "we
use LSIDs in our database as we gather data from the web *if LSIDs are
available*". Those are not her precise words, and I apologize if they
are wrong to boot, and the emphasis is mine alone. I am very interested
in that aspect: other researchers may or may not be assigning LSIDs to
their data, but we must gain access to that data in any case. How do we
do that?
Another aspect of versioning that came up was that of authority. If we
step outside the context of this thread (outside the box, so to speak)
and look around, this question comes up often just about everywhere.
Software development comes to mind. The apache foundation has this
notion of "committers". A commiter is one who has been given the
authority to make changes in the version controlled source code for a
project. Apache foundation runs on a meritocracy, where people are
elected to committer status after showing appropriate skills, etc. Maybe
that doesn't apply to life sciences research, but what could be imported
from such ideas into LSID?
What I got:
I came away from the largest picture of the discussion that there is a
need for means by which all forms of identifiers can be generated and
used by all. As Carole said, they are used if they are available.
Whether they start with something that makes sense in an HTTP
environment may or may not be as important. As Dan suggested, if you can
"rot13" (rotate the characters that make the string) and the identifier
is still useful, then it shouldn't matter how the string is constructed;
it does matter that the string is, at once, available, and findable.
After all, an LSID, an ARK, a PSI (to the topic mappers) is a shortcut,
a one string fits all (for those who know it) identifier of some object,
concept, subject, whatever you want to call it. LSID happens to be the
solution adopted by the life sciences community. However...
There is this subject that crept up in the previous couple of decades
known as "psychoneuroimmunology." That's what you get when the
psychologist start collaborating with neurologists who are also
collaborating with immunologists. I feel comfortable in predicting that
such collaborations will move in directions that include non-lifescience
workers, who are not familiar with LSIDs and who will need to use them.
That argues for separation of properties which identify objects from the
shortcuts we invent to identify those objects. The properties
(attributes, sorry) still prevail. If I lose a finger in an accident,
I'm still me.
If we happened to agree that collecting object identity properties and
associating those with identifier shortcuts in a way that render them
searchable, say, using rdf triples on the web, then we are a step closer
to allowing even google to help us identify our objects. This, I
believe, is important to the larger picture of federating research
efforts among heterogeneous work groups everywhere. Doing so means that
we can then include identifiers on the web in numerous ways, making them
ubiquitous, no matter how they are constructed. Note: by saying that, I
am not advocating ad hoc fabrications; I strongly believe that projects
like LSID, ARK, even PSIs, are important and warrant the efforts of
standardization. If one were to imagine a public information commons,
one supported by several kinds of entities, including NLM, NSF, and even
aspects of the philanthropic universe, then one could imagine federating
all the working ontologies we use together with the identifiers
associated with the objects represented in those ontologies. I tend to
think that a federation of subject maps would suit global collaboration
and encourage greater and more standardized use of identifiers such as
LSID. In my view, a subject map, among other things, is a "name graph."
That's a bit more than a half EURO for the day.
Cheers,
Jack
[1] http://www.ietf.org/rfc/rfc3401.txt
Received on Monday, 31 July 2006 17:52:18 UTC