- From: <SLetovsky@aol.com>
- Date: Mon, 3 May 2004 20:05:38 EDT
- To: sjmm@us.ibm.com, greg@tyrelle.net
- Cc: public-semweb-lifesci@w3.org
- Message-ID: <31.4752c8cd.2dc83852@aol.com>
All, Dumb question for this LSID thread, since there seem to be people on it who understand the goals of LSID well. In my experience a critical problem in bioinformatics is a diversity of identifers for the same thing (typically a gene or gene product sequence). These identifiers typically come from different namespaces or from biological nomenclatures. A frequent and time-consuming problem is unifying datasets from different sources which refer to these objects using different symbols, necessitating a synonym-aware relational join. This sounds simpler than it is in practice; the synonym relations display all manner of cardinality other than the hoped-for one-to-one. To further complicate matters, all namespaces evolve, adding, retiring, splitting and merging previously allocated identifiers, a process that reflects ongoing refinement of the underlying biology. There is no systematic versioning of namespaces or datasets. Clearly stable gene identifiers maintained by authoritative sources and used by all producers of data would be a big help, but despite the best efforts of the MODs (model organism databases), groups such as NCBI, EBI, UCSC, etc., the problem of resolving identifier references still routinely crops up in day-to-day bioinformatics work, and generates a lot of frustration and wasted time. My question is, do LSIDs address this issue? Once he problem has been translated from bioinformatics-speak to w3c-speak I can no longer tell. My impression is that LSIDs are concerned more with hostname independence rather than semantic equivalence. Cheers, -Stan
Received on Monday, 3 May 2004 20:30:13 UTC