Re: Fw: Use of LSIDs in RDF (fwd) from SLetovsky@aol.com on 2004-05-04 (public-semweb-lifesci@w3.org from May 2004)

From: <SLetovsky@aol.com>
Date: Mon, 3 May 2004 20:05:38 EDT
To: sjmm@us.ibm.com, greg@tyrelle.net
Cc: public-semweb-lifesci@w3.org
Message-ID: <31.4752c8cd.2dc83852@aol.com>

All,

    Dumb question for this LSID thread, since there seem to be people on it 
who understand the goals of LSID well. In my experience a critical problem in 
bioinformatics is a diversity of identifers for the same thing (typically a 
gene or gene product sequence). These identifiers
typically come from different namespaces or from biological nomenclatures. A 
frequent and time-consuming problem is unifying datasets from different 
sources which refer to these
objects using different symbols, necessitating a synonym-aware relational 
join. This sounds
simpler than it is in practice; the synonym relations display all manner of 
cardinality other than the hoped-for one-to-one. To further complicate matters, 
all namespaces evolve, adding, retiring, splitting and merging previously 
allocated identifiers, a process that
reflects ongoing refinement of the underlying biology. There is no systematic 
versioning
of namespaces or datasets.

Clearly stable gene identifiers maintained by authoritative sources and used 
by all
producers of data would be a big help, but despite the best efforts of the 
MODs (model organism databases), groups such as NCBI, EBI, UCSC, etc., the 
problem of resolving
identifier references still routinely crops up in day-to-day bioinformatics 
work, and
generates a lot of frustration and wasted time.

My question is, do LSIDs address this issue? Once he problem has been 
translated from
bioinformatics-speak to w3c-speak I can no longer tell. My impression is that 
LSIDs are concerned more with hostname independence rather than semantic 
equivalence.

Cheers, -Stan

Received on Monday, 3 May 2004 20:30:13 UTC