Re: Use of LSID's "in the wild"

Hi Mark:
 
> I'm writing a manuscript at the moment where I discuss LSIDs, and I'm
> trying to get a sense of how many people are using them "in the wild".
> I know that biopathways has set up a lot of "proxy" LSID resolvers, but
> that's kinda cheating :-)  I'm wondering who is actually using the LSID
> standard in a production environment.  I know that BioMOBY and
> myGrid/Tverna both use LSIDs, but who else?

Here's a paper that describes the use of LSIDs by three other early adopters
besides myGrid and BioMoby. It's co-authored by one of the authors of the
LSID spec (Sean Martin):

The impact of Life Science Identifiers on Informatics data
http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view
&paper_id=126

When you ask, "who is using the LSID standard?" you might want to
differentiate between organizations who have adopted LSID syntax vs those
who are also hosting and maintaining LSID resolution services. The latter is
a much bigger cookie to swallow, but the former is still a step forward.

If the only thing that comes from the LSID spec is a notion of an identifier
syntax that becomes widely adopted by bioinformatics data providers, it
would be a huge success, which was also noted in the conclusion of an IBM
article you may have seen:
http://www-128.ibm.com/developerworks/webservices/library/os-lsid2/

Another useful distinction to make when considering who is using LSIDs is
whether they are data providers or application providers (or both). Ideally,
you'd like to see the application providers using LSIDs that are created and
managed by the data providers, rather than used only internally within the
application.

Here are some LSID users that would fall into the data provider category:

* The HapMap project. I don't know if they provide a resolution service.
Search for 'LSID' on this page:
http://www.hapmap.org/downloads/index.html.en

* Affymetrix uses an LSID-like syntax in its MAGE-ML formatted files
containing NetAffx annotations of the sequences used in array designs.
Here's a snippet from one such file:

<BioSequence_package>
  <BioSequence_assnlist>
    <BioSequence identifier="Affymetrix.com:Transcript:HG-U133A.1007_s_at"
         ....

It's not a true LSID since it doesn't begin with 'urn:lsid'. Affy doesn't
host an LSID resolver or provide any sort of lookup service using these ids.
I summarized some of the issues we ran into here (see the section titled
'LSIDs and Content Negotiation' in particular):

http://lists.w3.org/Archives/Public/public-swls-ws/2004Nov/att-0000/Affy_Sem
Web-LifeSci_position_paper.pdf

* Pseudogene.org. Don't know if they offer a resolution service:
http://www.pseudogene.org/cgi-bin/set-results.cgi?tax_id=9606&all=View+All+S
ets&criterion0=&operator0=&searchValue0=&sort=0&output=html

The following would fall in the application provider category of LSID users.
While these may not fully qualify as "in the wild", one could ask: How
widely are these apps being used by other parties, either in an R&D or
production setting?

* BioPathways Consortium (as you mention above).
http://lsid.biopathways.org/authorities.shtml

* Intellidimensions's RDF Gateway and Eric Jain's UniProt RDF project:
http://labs.intellidimension.com/uniprot/query.rsp?q=10
http://expasy3.isb-sib.ch/~ejain//rdf/migration.html

* KIM. See Sean Martin's presentation from the W3C meeting. It describes a
system that makes extensive use of LSIDs:
http://www.w3.org/2004/10/swls/w3c_slrp_presentation_Sean_Martin_IBM.pdf

* GBIF. Looks like they are still at an early stage of development.
http://wiki.gbif.org/dadiwiki/wikka.php?wakka=col2005lsid

There may be others out there. This is not necessarily an exhaustive
listing.

Cheers,
Steve

Received on Saturday, 3 June 2006 01:34:16 UTC