Re: [BioRDF] Comments from Christoph Grabmuller on BioRDF microarray provenance from Christoph Grabmuller on 2010-11-09 (public-semweb-lifesci@w3.org from November 2010)

From: Christoph Grabmuller <grabmuel@ebi.ac.uk>
Date: Tue, 9 Nov 2010 10:09:18 +0000
To: mdmiller <mdmiller53@comcast.net>
Cc: "M. Scott Marshall" <mscottmarshall@gmail.com>, HCLS <public-semweb-lifesci@w3.org>
Message-ID: <AANLkTimiOLavHfBDpeAvy=D3HgmdrgmL5PNDhRo7Ek1Z@mail.gmail.com>

On Mon, Nov 8, 2010 at 4:02 PM, mdmiller <mdmiller53@comcast.net> wrote:
> 2) Many 'things' are represented as strings (e.g. genes), which makes
> it often impossible to run a federated query against another endpoint.
> Gene names might somewhat consistent for HUGO, but what about other
> species? Also, just the simple variance between 'STEAP2' and 'Steap2'
> makes a (direct) federated query impossible.
>
> * actually, HGNC Gene Symbols and entrez accessions are very stable.  for
> ArrayExpress, the ADF file will usually map to one or both of these
> identifiers.  in practice, i've not seen this to be a problem but for the
> paper we didn't go far enough.
> --mm

Yes, the HGNC Gene Symbols are stable, but what about other species?
So entrez accessions are the 'standard' input format for genes?

And even with HGNC it's not always that easy. Let's say I want to ask
bio2rdf what the uniprot accession is for the symbol 'CFTR':
http://bio2rdf.org/uniprot:P13569 only contains 'CFTR_HUMAN' and
matching that with 'FILTER regex()' is highly impractical across so
much data.
-cg

> 3) I like the Excel to RDF converter, but it relies on the user
> entering correct namespaces, names and database ids from various
> places in a syntactically correct way. This requires knowledge of the
> correct databases to choose and the 'correct' uri (many variants to
> chose from).
> If people just enter strings we are not all that far away from MAGE-TAB.
>
> * i'm involved in an open source project, Annotare, that seeks to put a nice
> UI on top of creating MAGE-TAB documents for a bench scientist.  part of
> that is use of the NCBO tools to make it easy for the creator of the
> document to go fetch the appropriate term from the appropriate
> onotlogy/vocabulary.  version one has support for EFO built-in, one of the
> main goals for version 2 is to make this much easier and much broader.
> --mm

That looks like a very useful tool. Out of curiosity: how are the
ontologies/vocabularies loaded?
-cg

Received on Tuesday, 9 November 2010 11:53:54 UTC