Re: [HCLSIG] homologs for yeast proteins? from Brian Osborne on 2006-01-31 (public-semweb-lifesci@w3.org from January 2006)

From: Brian Osborne <osborne1@optonline.net>
Date: Tue, 31 Jan 2006 17:34:14 -0500
To: chris mungall <cjm@fruitfly.org>
Cc: public-semweb-lifesci@w3.org, jluciano@predmed.com
Message-id: <C0055016.712B%osborne1@optonline.net>
Chris,

Good answer. I must add though that saying I 'advocate' Bioperl, or any of
the existing Bio* packages, for answering such seemingly simple questions is
a bit too strong an interpretation. The one I know, Bioperl, is not strong
in the area of ontologies and has a significant learning curve, for example.
Basically that's why I'm in this group. A challenging question for any
existing Bio* package is whether it can meet the _next_ set of challenges,
like integrating the data. Without attempting to answer that question I'll
say it's easy to imagine that "BioRDF", or the equivalent, would make an
excellent alternative.

Brian O.


On 1/31/06 4:05 PM, "chris mungall" <cjm@fruitfly.org> wrote:

> 
> Hi Brian
> 
> Any search for genes localised to, say, "mitochondrion" should indeed
> return genes that are annotated to either is_a chilren or part_of
> children of "mitochondrion" (the latter because localisation is
> transitive_over part_of). Standard query utilities such as the Entrez
> Gene search don't take this into account, so this is an excellent use
> case for the HCLSIG. I imagine Entrez Gene records will soon start
> incorporating annotations to other ontologies than GO, with a wider
> range of relations than the two currently used in GO, with different
> definitions in the OBO relations ontology.
> 
> Here is what a user has to do right now in order to get homologs of
> genes localised to a specific cellular component:
> 
> for the biologist: query an ontology-aware web interface such as AmiGO,
> get the list of genes localised to the component[1], and copy-n-paste
> that list of genes into an orthology-aware web interface such as
> inparanoid or Entrez gene. Not exactly ideal. Soon AmiGO will also
> include the inparanoid orthology calls, so this particular query can be
> answered via a single one-stop web portal; there may already be such a
> one-stop web portal that can answer this query right now. But it it
> doesn't really help with the generalised case of ontology-aware queries
> of disparate data sources. BioMoby may be able to do something like
> this. Any BioMoby folks on the list?
> 
> For the developer: download a data warehouse such as the godb (which
> underpins AmiGO). Precomputed transitive closure tables give you some
> of the benefits of ontology-aware queries whilst remaining in the
> relational paradigm. Not as flexible, as a query language like SPARQL,
> but much faster. Some data integration is required on the part of the
> developer (though for this particular use case the required orthology
> calls will soon be in the go db). Not ideal. Brian advocates a
> scripting approach, using BioPerl, go-perl or your favourite
> ontology-aware API plus parsers for a bunch of ancilliary data files.
> Definitely not ideal. Personally I hope that if the SW delivers one
> thing it's a respite from this kind of ad-hoc one-off-script data
> integration which is unfortunately the norm in bioinformatics.
> 
> So in answer to your MD friend's question: it is possible right now,
> given knowledge of available data sources, the semantics implicit in
> those resources and in how those resources answer queries, and the
> patience to manually integrate data from these resources where that
> integration hasn't been done for you by some available portal or data
> warehouse.
> 
> So hopefully the SW and related technologies will help with all this.
> Solving the generalised form of this problem completely is kind of the
> holy grail in data integration in bioinformatics. There's a lot of
> difficult stuff here, like efficiently querying disparate resources (or
> keeping a warehouse up to date) combined with inference. But I think
> different groups represented on this list have bitten off different
> chunks of the problem with promising results so far.
> 
> For example, via the NCBO[2] you'll soon be able to query classes from
> any OBO[3] ontology (either via a user interface, or programmatically
> via an API or an ontology-aware query language like SPARQL), and from
> there link to other data sources, with the appropriate inferences made
> depending on the semantics of the relations in the underlying ontology.
> 
> Cheers
> Chris
> 
> [1]  
> http://www.godatabase.org/cgi-bin/amigo/
> go.cgiview=details&show_associations=list&search_constraint=terms&depth=
> 0&query=GO:0005739
> [2] http://www.bioontologies.org
> [3] http://obo.sourceforge.net
> 
> 
> On Jan 31, 2006, at 12:03 PM, Brian Osborne wrote:
> 
>> 
>> Joanne,
>> 
>> If you're interested in doing this query manually you can use Entrez
>> Gene (
>> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene),
>> something
>> like:
>> 
>> mitochondrial [go] AND signaling [go] AND pathway [go]
>> 
>> What I _don't_ know is whether you're querying with any child terms as
>> well,
>> presumably you want to find genes with the children assigned to them.
>> 
>> As you know Entrez Gene assigns ontology terms to genes, and all of
>> their
>> proteins inherit the term, functionally speaking.
>> 
>> You could also do this using Bioperl, if your colleague would like to
>> write
>> a script.
>> 
>> Brian O.
>> 
>> 
>> On 1/31/06 2:27 PM, "Joanne Luciano" <jluciano@predmed.com> wrote:
>> 
>>> 
>>> Hi,
>>> 
>>> An MD I met with last week mentioned briefly the desire to obtain
>>> homologs
>>> for yeast proteins with the GO ID 0031930.  With it was a request for
>>> other
>>> suggestions for querying proteins with this ontology assignment
>>> (looking for
>>> mammalian homologs).
>>> 
>>> Can the semantic web help with this or is it already basic and
>>> solved, in
>>> which case, can someone point me or fill in the details?  Where
>>> should I go?
>>> What questions should I  ask?
>>> 
>>> Joanne
>>> 
>>> 
>>> 
>> 
>> 
>> 
>
Received on Tuesday, 31 January 2006 22:34:11 UTC