- From: mdmiller <mdmiller53@comcast.net>
- Date: Tue, 9 Nov 2010 08:45:01 -0800
- To: "Christoph Grabmuller" <grabmuel@ebi.ac.uk>
- Cc: "HCLS" <public-semweb-lifesci@w3.org>, "Ravi Shankar" <rshankar@stanford.edu>
hi cristoph, "That looks like a very useful tool. Out of curiosity: how are the ontologies/vocabularies loaded?" i did not actually work on that piece so don't know the details. i do know that it is still a work in progress, i've cc'd ravi who is the one who wrote that code (the UI for the application is written in adobe's flex air) "Yes, the HGNC Gene Symbols are stable, but what about other species?" good point, that's where i've seen entrez accessions usually used as the most definitive source. "So entrez accessions are the 'standard' input format for genes?" this i've seen as more a de facto best practice than anything for identifying a sequence (but i'm not sure what you mean by a standard input format). it seems to work well for identifying a sequence/gene for the purpose of communicating things such as gene lists, the database is well curated. as we have all noticed 'what is a gene' is not something that has a definitive answer. i'm currently working on a personal project where the pragmatic answer for 'what is a gene' is simply the 'cross referencing set of sequence identifiers'. by this i mean where i download the information from HGNC, i get cross references to a number of other sequence databases such as entrez. when i parse an ADF file i get additional cross references from the local identifiers to entrez, genbank, etc, and so on. the danger of this, of course, is if there is a bad cross reference that associates, say, two different gene symbols. it is also important to distinguish what cross references are to a broad category (GO symbols for instance) that shouldn't be used. part of the project is to explore this idea and its ramifications. this leaves to others more knowledgeable the actual identification work of genes and proteins and their relationship. cheers, michael ----- Original Message ----- From: "Christoph Grabmuller" <grabmuel@ebi.ac.uk> To: "mdmiller" <mdmiller53@comcast.net> Cc: "M. Scott Marshall" <mscottmarshall@gmail.com>; "HCLS" <public-semweb-lifesci@w3.org> Sent: Tuesday, November 09, 2010 2:09 AM Subject: Re: [BioRDF] Comments from Christoph Grabmuller on BioRDF microarray provenance On Mon, Nov 8, 2010 at 4:02 PM, mdmiller <mdmiller53@comcast.net> wrote: > 2) Many 'things' are represented as strings (e.g. genes), which makes > it often impossible to run a federated query against another endpoint. > Gene names might somewhat consistent for HUGO, but what about other > species? Also, just the simple variance between 'STEAP2' and 'Steap2' > makes a (direct) federated query impossible. > > * actually, HGNC Gene Symbols and entrez accessions are very stable. for > ArrayExpress, the ADF file will usually map to one or both of these > identifiers. in practice, i've not seen this to be a problem but for the > paper we didn't go far enough. > --mm Yes, the HGNC Gene Symbols are stable, but what about other species? So entrez accessions are the 'standard' input format for genes? And even with HGNC it's not always that easy. Let's say I want to ask bio2rdf what the uniprot accession is for the symbol 'CFTR': http://bio2rdf.org/uniprot:P13569 only contains 'CFTR_HUMAN' and matching that with 'FILTER regex()' is highly impractical across so much data. -cg > 3) I like the Excel to RDF converter, but it relies on the user > entering correct namespaces, names and database ids from various > places in a syntactically correct way. This requires knowledge of the > correct databases to choose and the 'correct' uri (many variants to > chose from). > If people just enter strings we are not all that far away from MAGE-TAB. > > * i'm involved in an open source project, Annotare, that seeks to put a > nice > UI on top of creating MAGE-TAB documents for a bench scientist. part of > that is use of the NCBO tools to make it easy for the creator of the > document to go fetch the appropriate term from the appropriate > onotlogy/vocabulary. version one has support for EFO built-in, one of the > main goals for version 2 is to make this much easier and much broader. > --mm That looks like a very useful tool. Out of curiosity: how are the ontologies/vocabularies loaded? -cg
Received on Tuesday, 9 November 2010 16:45:44 UTC