Re: FW: [BioPAX-discuss] LSID Best practices...

JZ>First, should LSIDs be used as the unique identifiers for data to such 
a way that if the data are moved to another JZ>database or application, 
people should not change the LSID that is associated with the data no 
matter what?

One of the main uses of LSID is to allow people/software to link 
information across multiple "vertical" databases/applications and for 
third party applications to be able to unambiguously name/link into 
information that has items named with LSIDs. As far as I know not much 
discussion has centered around data moving to another database or store. 
The LSID protocol does support transparent caching (so that copies of data 
can be stored and accessed locally) but I don't think this is the same 
thing that you meant. 

Generally if two records in two different databases are identical, it 
would be preferable if they had the same LSID if practical, as this would 
promote connections between data items which is one of the benefits of the 
entire scheme. I wonder which of the two databases would supply the data 
if the LSID is dereferenced? Probably the oldest, where the data 
originated. Certainly one may give the same data more than one LSID name 
(and if the records are in two different databases, there is greater 
reason) so one should not feel bound by my earlier stated preference.  A 
more likely scenario is where one would have metadata to record the 
equivalentTo, derivedFrom or perhaps isaformof relationships associated 
with two different LSIDs, one in each database or that the LSID from the 
later database would simply record a reference to the record in the first 
by using the LSID.

JZ>Second, LSID is designed to separate data and metadata in order to .... 
So as the example in the best practice, if the JZ>person's name changed, 
can we say only the metadata about the person is changed, but not the 
data, which is person here? 

This is a tough one.. now you are getting into philosophy  :-)  Actually I 
dont like using the example of a person in the Best practises doc as I 
think it just introduces unecessary ambiguity. Perhaps the thing to 
remember is that the whole scheme is meant as a convenience as well as an 
aid to promoting interoperability and data linkages and where it is 
possible to help that along, help it along... but first and foremost what 
you do in terms of providing data vs. metadata must meet the needs of the 
users of the system.

As is suggested, in the researcher database the LSID might be constructed 
from some unchanging but unique attribute of the person - a serial number 
or a customerid etc. Other attributes in metadata might be the name, 
address, date of birth. These are most likely non-unique. 

I have no idea what "data" one should write for a person LSID but perhaps 
Ray Kurzweil does and if so his people LSIDs might link directly to that 
bytestream. My tendency would be to have no data directly associated with 
the researcher LSID and instead use that LSID as the anchor where I can 
hook as metadata some literal infomation and pointers to LSIDs for all 
sorts of tangibles like perhaps the persons photo or their sequence or 
resume.


JZ>Or if the person moved to another instituition, what will happen to 
his/her person LSID?

Database records don't float around following after people when they move 
and I dont believe there is an intention that LSID named database records 
will either :-) So what happens in the case above is entirely up to the 
provider of the LSID. Perhaps they would just stop serving it - there is 
no guarantee of persistence with a LSID, just uniqueness (which makes 
persistence easier if it is deemed useful for that data). If perhaps the 
new institution creates LSIDs for their people, it might have metadata 
containing the samePersonAs relationship and point to the orginal LSID. 
The old institution might likewise continue to serve their LSID & include 
a reference in the metadata that indicates the new LSID for the moved 
person. Getting eveyone to create this kind of linkage is not too high on 
my own priority list though as I think there are much more valuable and 
lower hanging fruit ;-)

Kindest regards, Sean

--
Sean Martin
IBM Corp

Received on Thursday, 7 April 2005 23:25:44 UTC