Re: [HCLSIG] homologs for yeast proteins? from chris mungall on 2006-01-31 (public-semweb-lifesci@w3.org from February 2006)

From: chris mungall <cjm@fruitfly.org>
Date: Tue, 31 Jan 2006 13:05:21 -0800
To: Brian Osborne <osborne1@optonline.net>
Cc: public-semweb-lifesci@w3.org, jluciano@predmed.com
Message-Id: <f4bca10a3f0fc1e9493e972ba313a1b7@fruitfly.org>
Hi Brian

Any search for genes localised to, say, "mitochondrion" should indeed  
return genes that are annotated to either is_a chilren or part_of  
children of "mitochondrion" (the latter because localisation is  
transitive_over part_of). Standard query utilities such as the Entrez  
Gene search don't take this into account, so this is an excellent use  
case for the HCLSIG. I imagine Entrez Gene records will soon start  
incorporating annotations to other ontologies than GO, with a wider  
range of relations than the two currently used in GO, with different  
definitions in the OBO relations ontology.

Here is what a user has to do right now in order to get homologs of  
genes localised to a specific cellular component:

for the biologist: query an ontology-aware web interface such as AmiGO,  
get the list of genes localised to the component[1], and copy-n-paste  
that list of genes into an orthology-aware web interface such as  
inparanoid or Entrez gene. Not exactly ideal. Soon AmiGO will also  
include the inparanoid orthology calls, so this particular query can be  
answered via a single one-stop web portal; there may already be such a  
one-stop web portal that can answer this query right now. But it it  
doesn't really help with the generalised case of ontology-aware queries  
of disparate data sources. BioMoby may be able to do something like  
this. Any BioMoby folks on the list?

For the developer: download a data warehouse such as the godb (which  
underpins AmiGO). Precomputed transitive closure tables give you some  
of the benefits of ontology-aware queries whilst remaining in the  
relational paradigm. Not as flexible, as a query language like SPARQL,  
but much faster. Some data integration is required on the part of the  
developer (though for this particular use case the required orthology  
calls will soon be in the go db). Not ideal. Brian advocates a  
scripting approach, using BioPerl, go-perl or your favourite  
ontology-aware API plus parsers for a bunch of ancilliary data files.  
Definitely not ideal. Personally I hope that if the SW delivers one  
thing it's a respite from this kind of ad-hoc one-off-script data  
integration which is unfortunately the norm in bioinformatics.

So in answer to your MD friend's question: it is possible right now,  
given knowledge of available data sources, the semantics implicit in  
those resources and in how those resources answer queries, and the  
patience to manually integrate data from these resources where that  
integration hasn't been done for you by some available portal or data  
warehouse.

So hopefully the SW and related technologies will help with all this.  
Solving the generalised form of this problem completely is kind of the  
holy grail in data integration in bioinformatics. There's a lot of  
difficult stuff here, like efficiently querying disparate resources (or  
keeping a warehouse up to date) combined with inference. But I think  
different groups represented on this list have bitten off different  
chunks of the problem with promising results so far.

For example, via the NCBO[2] you'll soon be able to query classes from  
any OBO[3] ontology (either via a user interface, or programmatically  
via an API or an ontology-aware query language like SPARQL), and from  
there link to other data sources, with the appropriate inferences made  
depending on the semantics of the relations in the underlying ontology.

Cheers
Chris

[1]  
http://www.godatabase.org/cgi-bin/amigo/ 
go.cgiview=details&show_associations=list&search_constraint=terms&depth= 
0&query=GO:0005739
[2] http://www.bioontologies.org
[3] http://obo.sourceforge.net


On Jan 31, 2006, at 12:03 PM, Brian Osborne wrote:

>
> Joanne,
>
> If you're interested in doing this query manually you can use Entrez  
> Gene (
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene),  
> something
> like:
>
> mitochondrial [go] AND signaling [go] AND pathway [go]
>
> What I _don't_ know is whether you're querying with any child terms as  
> well,
> presumably you want to find genes with the children assigned to them.
>
> As you know Entrez Gene assigns ontology terms to genes, and all of  
> their
> proteins inherit the term, functionally speaking.
>
> You could also do this using Bioperl, if your colleague would like to  
> write
> a script.
>
> Brian O.
>
>
> On 1/31/06 2:27 PM, "Joanne Luciano" <jluciano@predmed.com> wrote:
>
>>
>> Hi,
>>
>> An MD I met with last week mentioned briefly the desire to obtain  
>> homologs
>> for yeast proteins with the GO ID 0031930.  With it was a request for  
>> other
>> suggestions for querying proteins with this ontology assignment  
>> (looking for
>> mammalian homologs).
>>
>> Can the semantic web help with this or is it already basic and  
>> solved, in
>> which case, can someone point me or fill in the details?  Where  
>> should I go?
>> What questions should I  ask?
>>
>> Joanne
>>
>>
>>
>
>
>
Received on Wednesday, 1 February 2006 22:56:16 UTC