- From: chris mungall <cjm@fruitfly.org>
- Date: Tue, 31 Jan 2006 13:05:21 -0800
- To: Brian Osborne <osborne1@optonline.net>
- Cc: public-semweb-lifesci@w3.org, jluciano@predmed.com
Hi Brian Any search for genes localised to, say, "mitochondrion" should indeed return genes that are annotated to either is_a chilren or part_of children of "mitochondrion" (the latter because localisation is transitive_over part_of). Standard query utilities such as the Entrez Gene search don't take this into account, so this is an excellent use case for the HCLSIG. I imagine Entrez Gene records will soon start incorporating annotations to other ontologies than GO, with a wider range of relations than the two currently used in GO, with different definitions in the OBO relations ontology. Here is what a user has to do right now in order to get homologs of genes localised to a specific cellular component: for the biologist: query an ontology-aware web interface such as AmiGO, get the list of genes localised to the component[1], and copy-n-paste that list of genes into an orthology-aware web interface such as inparanoid or Entrez gene. Not exactly ideal. Soon AmiGO will also include the inparanoid orthology calls, so this particular query can be answered via a single one-stop web portal; there may already be such a one-stop web portal that can answer this query right now. But it it doesn't really help with the generalised case of ontology-aware queries of disparate data sources. BioMoby may be able to do something like this. Any BioMoby folks on the list? For the developer: download a data warehouse such as the godb (which underpins AmiGO). Precomputed transitive closure tables give you some of the benefits of ontology-aware queries whilst remaining in the relational paradigm. Not as flexible, as a query language like SPARQL, but much faster. Some data integration is required on the part of the developer (though for this particular use case the required orthology calls will soon be in the go db). Not ideal. Brian advocates a scripting approach, using BioPerl, go-perl or your favourite ontology-aware API plus parsers for a bunch of ancilliary data files. Definitely not ideal. Personally I hope that if the SW delivers one thing it's a respite from this kind of ad-hoc one-off-script data integration which is unfortunately the norm in bioinformatics. So in answer to your MD friend's question: it is possible right now, given knowledge of available data sources, the semantics implicit in those resources and in how those resources answer queries, and the patience to manually integrate data from these resources where that integration hasn't been done for you by some available portal or data warehouse. So hopefully the SW and related technologies will help with all this. Solving the generalised form of this problem completely is kind of the holy grail in data integration in bioinformatics. There's a lot of difficult stuff here, like efficiently querying disparate resources (or keeping a warehouse up to date) combined with inference. But I think different groups represented on this list have bitten off different chunks of the problem with promising results so far. For example, via the NCBO[2] you'll soon be able to query classes from any OBO[3] ontology (either via a user interface, or programmatically via an API or an ontology-aware query language like SPARQL), and from there link to other data sources, with the appropriate inferences made depending on the semantics of the relations in the underlying ontology. Cheers Chris [1] http://www.godatabase.org/cgi-bin/amigo/ go.cgiview=details&show_associations=list&search_constraint=terms&depth= 0&query=GO:0005739 [2] http://www.bioontologies.org [3] http://obo.sourceforge.net On Jan 31, 2006, at 12:03 PM, Brian Osborne wrote: > > Joanne, > > If you're interested in doing this query manually you can use Entrez > Gene ( > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene), > something > like: > > mitochondrial [go] AND signaling [go] AND pathway [go] > > What I _don't_ know is whether you're querying with any child terms as > well, > presumably you want to find genes with the children assigned to them. > > As you know Entrez Gene assigns ontology terms to genes, and all of > their > proteins inherit the term, functionally speaking. > > You could also do this using Bioperl, if your colleague would like to > write > a script. > > Brian O. > > > On 1/31/06 2:27 PM, "Joanne Luciano" <jluciano@predmed.com> wrote: > >> >> Hi, >> >> An MD I met with last week mentioned briefly the desire to obtain >> homologs >> for yeast proteins with the GO ID 0031930. With it was a request for >> other >> suggestions for querying proteins with this ontology assignment >> (looking for >> mammalian homologs). >> >> Can the semantic web help with this or is it already basic and >> solved, in >> which case, can someone point me or fill in the details? Where >> should I go? >> What questions should I ask? >> >> Joanne >> >> >> > > >
Received on Wednesday, 1 February 2006 22:56:16 UTC