RE: [HCLSIG] homologs for yeast proteins? from Miller, Michael D (Rosetta) on 2006-02-01 (public-semweb-lifesci@w3.org from February 2006)

From: Miller, Michael D (Rosetta) <Michael_Miller@Rosettabio.com>
Date: Wed, 1 Feb 2006 08:13:02 -0800
To: "Brian Osborne" <osborne1@optonline.net>, "chris mungall" <cjm@fruitfly.org>
cc: public-semweb-lifesci@w3.org, jluciano@predmed.com
Message-ID: <E1F4Kbj-0000LT-Pi@lisa.w3.org>
Hi Brian and Chris,

The good news is that the Bio* communities are well-established.  I can
envision some flavor of the Bio* code running behind semantic web
interfaces to get the answers with existing code and then some
additional marshalling code to put it together the return answer to the
client.

cheers,
Michael

> -----Original Message-----
> From: public-semweb-lifesci-request@w3.org 
> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of 
> Brian Osborne
> Sent: Tuesday, January 31, 2006 2:34 PM
> To: chris mungall
> Cc: public-semweb-lifesci@w3.org; jluciano@predmed.com
> Subject: Re: [HCLSIG] homologs for yeast proteins?
> 
> 
> 
> Chris,
> 
> Good answer. I must add though that saying I 'advocate' 
> Bioperl, or any of
> the existing Bio* packages, for answering such seemingly 
> simple questions is
> a bit too strong an interpretation. The one I know, Bioperl, 
> is not strong
> in the area of ontologies and has a significant learning 
> curve, for example.
> Basically that's why I'm in this group. A challenging question for any
> existing Bio* package is whether it can meet the _next_ set 
> of challenges,
> like integrating the data. Without attempting to answer that 
> question I'll
> say it's easy to imagine that "BioRDF", or the equivalent, 
> would make an
> excellent alternative.
> 
> Brian O.
> 
> 
> On 1/31/06 4:05 PM, "chris mungall" <cjm@fruitfly.org> wrote:
> 
> > 
> > Hi Brian
> > 
> > Any search for genes localised to, say, "mitochondrion" 
> should indeed
> > return genes that are annotated to either is_a chilren or part_of
> > children of "mitochondrion" (the latter because localisation is
> > transitive_over part_of). Standard query utilities such as 
> the Entrez
> > Gene search don't take this into account, so this is an 
> excellent use
> > case for the HCLSIG. I imagine Entrez Gene records will soon start
> > incorporating annotations to other ontologies than GO, with a wider
> > range of relations than the two currently used in GO, with different
> > definitions in the OBO relations ontology.
> > 
> > Here is what a user has to do right now in order to get homologs of
> > genes localised to a specific cellular component:
> > 
> > for the biologist: query an ontology-aware web interface 
> such as AmiGO,
> > get the list of genes localised to the component[1], and 
> copy-n-paste
> > that list of genes into an orthology-aware web interface such as
> > inparanoid or Entrez gene. Not exactly ideal. Soon AmiGO will also
> > include the inparanoid orthology calls, so this particular 
> query can be
> > answered via a single one-stop web portal; there may 
> already be such a
> > one-stop web portal that can answer this query right now. But it it
> > doesn't really help with the generalised case of 
> ontology-aware queries
> > of disparate data sources. BioMoby may be able to do something like
> > this. Any BioMoby folks on the list?
> > 
> > For the developer: download a data warehouse such as the godb (which
> > underpins AmiGO). Precomputed transitive closure tables 
> give you some
> > of the benefits of ontology-aware queries whilst remaining in the
> > relational paradigm. Not as flexible, as a query language 
> like SPARQL,
> > but much faster. Some data integration is required on the 
> part of the
> > developer (though for this particular use case the required 
> orthology
> > calls will soon be in the go db). Not ideal. Brian advocates a
> > scripting approach, using BioPerl, go-perl or your favourite
> > ontology-aware API plus parsers for a bunch of ancilliary 
> data files.
> > Definitely not ideal. Personally I hope that if the SW delivers one
> > thing it's a respite from this kind of ad-hoc one-off-script data
> > integration which is unfortunately the norm in bioinformatics.
> > 
> > So in answer to your MD friend's question: it is possible right now,
> > given knowledge of available data sources, the semantics implicit in
> > those resources and in how those resources answer queries, and the
> > patience to manually integrate data from these resources where that
> > integration hasn't been done for you by some available 
> portal or data
> > warehouse.
> > 
> > So hopefully the SW and related technologies will help with 
> all this.
> > Solving the generalised form of this problem completely is 
> kind of the
> > holy grail in data integration in bioinformatics. There's a lot of
> > difficult stuff here, like efficiently querying disparate 
> resources (or
> > keeping a warehouse up to date) combined with inference. But I think
> > different groups represented on this list have bitten off different
> > chunks of the problem with promising results so far.
> > 
> > For example, via the NCBO[2] you'll soon be able to query 
> classes from
> > any OBO[3] ontology (either via a user interface, or 
> programmatically
> > via an API or an ontology-aware query language like 
> SPARQL), and from
> > there link to other data sources, with the appropriate 
> inferences made
> > depending on the semantics of the relations in the 
> underlying ontology.
> > 
> > Cheers
> > Chris
> > 
> > [1]  
> > http://www.godatabase.org/cgi-bin/amigo/
> > 
> go.cgiview=details&show_associations=list&search_constraint=te
> rms&depth=
> > 0&query=GO:0005739
> > [2] http://www.bioontologies.org
> > [3] http://obo.sourceforge.net
> > 
> > 
> > On Jan 31, 2006, at 12:03 PM, Brian Osborne wrote:
> > 
> >> 
> >> Joanne,
> >> 
> >> If you're interested in doing this query manually you can 
> use Entrez
> >> Gene (
> >> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene),
> >> something
> >> like:
> >> 
> >> mitochondrial [go] AND signaling [go] AND pathway [go]
> >> 
> >> What I _don't_ know is whether you're querying with any 
> child terms as
> >> well,
> >> presumably you want to find genes with the children 
> assigned to them.
> >> 
> >> As you know Entrez Gene assigns ontology terms to genes, and all of
> >> their
> >> proteins inherit the term, functionally speaking.
> >> 
> >> You could also do this using Bioperl, if your colleague 
> would like to
> >> write
> >> a script.
> >> 
> >> Brian O.
> >> 
> >> 
> >> On 1/31/06 2:27 PM, "Joanne Luciano" <jluciano@predmed.com> wrote:
> >> 
> >>> 
> >>> Hi,
> >>> 
> >>> An MD I met with last week mentioned briefly the desire to obtain
> >>> homologs
> >>> for yeast proteins with the GO ID 0031930.  With it was a 
> request for
> >>> other
> >>> suggestions for querying proteins with this ontology assignment
> >>> (looking for
> >>> mammalian homologs).
> >>> 
> >>> Can the semantic web help with this or is it already basic and
> >>> solved, in
> >>> which case, can someone point me or fill in the details?  Where
> >>> should I go?
> >>> What questions should I  ask?
> >>> 
> >>> Joanne
> >>> 
> >>> 
> >>> 
> >> 
> >> 
> >> 
> > 
> 
> 
> 
> 
>
Received on Wednesday, 1 February 2006 16:13:27 UTC