RE: Systems Biology Use Case from Eric Neumann on 2003-11-09 (public-semweb-lifesci@w3.org from November 2003)

From: Eric Neumann <ENeumann@BeyondGenomics.com>
Date: Sun, 9 Nov 2003 17:00:15 -0500
To: "Manuel Corpas" <protein_bioinformatics@yahoo.co.uk>, <public-semweb-lifesci@w3.org>
Message-ID: <FC5C355B8AE9F2499A5220CCEDC34756DC0395@bgmail.lifescience.com>
Manuel,

Thanks for your note. BTW, MIAME is not a standard but rather a minimal
set to base a microarray standard on, such as MAGE. I believe I can
quickly respond to you by saying that this group is not a standards
group-- we're not pushing Esperanto for the life sciences. In fact, I
think there are several examples of Esperanto already in the life
sciences, which I won't bother mentioning. I also hope that well-thought
through standard such as MAGE is better accepted soon by a larger
community, otherwise it will lose its utility as the technology changes.
Most large pharmas have not embraced it yet, and there is still not full
incorporation of MAGE at NIH, where other MIAME formats are supported!

What a few of us are interested is moving from data packing standards to
free flow of semantic definitions and use for the life sciences. We need
to worry a bit more about exchanging significance and insight than
simply microarray and MS values. RDF is to XML what a browser is to a
teletype. Its simply allows more expressivity, but the ontological
languages that can take advantage of RDF still need to be developed by
the community. A few exist already: GO.owl, BioPAX.owl, SNOMED.owl, and
UMLS.owl. As each group creates the semantic framework it most needs,
they will all be useable within a general RFD interconnected framework. 

SBML will not support the level of Systems Biology research groups like
Beyond Genomics are doing. We need a language that supports stated
hypotheses and incomplete knowledge. The literature is full of
mechnanisms that cannot be represented in terms of components and
reactions-- these need to be captured as well! Indeed, Systems Biology
is about to go through its second phase, in which biology researchers
are able to build and annotate models that rely less on kinetics and
more on causality and influence frames. An RDF scheme offers a lot of
richness to supporting this level of knowledge capture and structured
annotation.

As for making sure key communities are part of the process, BioPAX is
being developed by representatives from BioCYC, BIND, WIT, AfCS, and the
BioPathways Consortium. Their data will be a first test for the BioPAX
exchange model, which then should plug into an RDF framework directly.

Best,
Eric

> -----Original Message-----
> From: Manuel Corpas [mailto:protein_bioinformatics@yahoo.co.uk]
> Sent: Wednesday, November 05, 2003 5:53 AM
> To: public-semweb-lifesci@w3.org
> Subject: Re: Systems Biology Use Case
> 
> 
> 
> Hello all:
> 
> I would like to make some comments in relation to the
> email Eric K. Neumann sent to the community. What have
> we learnt in the past from the implementation of
> standards?
> 
> There have been great successes and even bigger
> failures when platforms have been developed. For
> instance, let's think of the English language, which
> is now the standard language of communication in international 
> circles. Its main competitor was Esperanto, a language that in 
> essence, was a mosaic of many different languages; an intermediate 
> between many romanic languages, whose aim was to be impartial.
> 
> Nevertheless, Esperanto failed as a platform and today
> hardly anyone uses it. One of the main reason was that
> it did not have a seed group of people who were able
> to use it. Everyone needed to learn it from scratch. Moreover, this 
> language did not have the right institutions supporting it, probably 
> because they did not targeted specifically a clear audience. In other
> words, its target audience was arguably too general.
> What if the United Nations would have used it? Would
> have given to other potential users some credibility?
> 
> In a more biological example, the adoption of MIAME
> (minimun information about microarray experiment)
> seems that has become a widely used tool for the
> generation of gene expression data on a genomic scale
> (Brazma et al, Nat. Gen., volume 29 no. 4 pp 365 -
> 371, 2001). In this case, the key of its sucess might
> be that it  appeared in a timely fashion and the work
> and the right institutions are behind this work,
> giving it support. They had a concrete problem
> (standar for microarray information representation)
> and a clear target audience: microarry
> experimentalists. They managed to raise awareness of
> this proble, a caught the attention of intitutions
> such as Nature, Stanford, the European Bioinformatics Institute and 
> many others.
> 
> The encoding language proposed for descriptions of
> pathway information in systems biology will certainly
> require a lot of effort to convince people to use it.
> If this new standard is to be adopted by publishers, a
> clear target audience needs to be specifically
> addressed, and this includes both institutions and
> users. Not to mention if this was to be the common
> platform for publishing pathway literature.
> 
> SBML, for example, has a clear user segment:
> biochemical similation packages for integration of information. 
> However, unless they pervade other communities for the use of such a 
> language, they might be in danger of not achieving the objective of 
> being adopted as a workbench for exchange of information
> among biologists.
> 
> The question I wanted to raise was, do we have the
> right people giving us the right support for RDF? It
> might be worth considering if KEGG, AfCS, BioCYC, BIND
> should be more specifically targeted, giving examples
> taken from their archives. Thus, intitutions such as
> Nature, which has supported extensively AfCS, would
> seriously consider the adoption of RDF.
> If we are lucky, this could make a difference in the
> level of awareness among potential users so more
> people start using our platforms
> 
> Best,
> 
> Manuel
> 
> 
>  --- Eric Neumann <ENeumann@BeyondGenomics.com> wrote:
> > Hello,
> >  
> > I'm posting a set of examples on how an RDF based
> > system could assist in
> > describing and sharing structured information
> > related to Systems
> > Biology. There are many examples of using RDF as a descriptive
> > language, but I'm interested if it can be used to describe not
> > only facts, but
> > assumptions and hypotheses regarding mechanisms of
> > diseases from a
> > multi-component perspective, typical to data
> > analysis from a Systems
> > Biology point of view.
> >  
> > First, one needs a mechanism for encoding most
> > molecular data: proteins,
> > genes, transcripts, metabolites, interactions. The
> > genome and its
> > constituents need to be made accessible through a descriptive system
> > that also supports a distributed annotations model
> > (DAS,
> > http://biodas.org ), but extensible to all molecular
> > species and
> > descriptors.
> >  
> > Second, there needs to be a robust and extensible
> > model for accessing
> > and including pathway data (e.g., KEGG, AfCS,
> > BioCYC, BIND) and merging
> > it to sets of annotated molecular data. It is hoped
> > that efforts such as
> > the BioPAX Ontology (http://biopax.org) will be the foundation for 
> > describing any group of pathway information, independent of source. 
> > The formalized relations between molecules,
> > interactions, and reactions will
> > be necessary for bridging the wide variety of
> > biomolecular phenomena.
> > Just as important will be the ability to aptly
> > describe the "context" by
> > which such phenomena are known to occur (e.g.,
> > tissue, disease,
> > developmental stages).
> >  
> > Third, much causal evidence is not yet in pathway databases, but 
> > exists in millions of articles of unstructured scientific
> > text (i.e.,
> > publications). These not only need to be referenced,
> > but the essential
> > molecular mechanisms they describe need to be
> > encoded in a semantic
> > format; for example:
> >  
> > <The Authors> <propose that>:
> > 
> > <the termination and modulation of>
> > 
> > <the JAK/STAT signalling pathway>
> > 
> > <is mediated by>
> > 
> > <tyrosine phosphatases>
> > 
> > <the SOCS (suppressor of cytokine signalling)
> > feedback inhibitors> and
> > 
> > <PIAS (protein inhibitor of activated STAT)
> > proteins>  
> > 
> >  
> > This is a nontrivial exercise, but it is possible in
> > time, and could be
> > achieved incrementally in stages until it became
> > part of the publication
> > process. This structured evidence would then be
> merged/layered on top
> > of other pathway and mechanistic information, so that inferences 
> > could be performed on the set. Publishers such as Nature are
> > already beginning to
> > explore the use of RDF in publication space.
> > Personally, I think
> > text-mining can help us with legacy text, but
> > publishers in thre future
> > should require authors to encode the model semantics
> > along with their
> > text and figures using some form of wizard tool...
> >  
> > Fourth, additional biological knowledge regarding
> > anatomy, physiology,
> > tissues, and diseases need also be represented in
> > RDF in order to
> > describe biological systems. A practical way to
> > begin this process would
> > be to translate the National Library of Medicine's
> > UMLS language into
> > RDF/OWL
> > (http://www.nlm.nih.gov/pubs/factsheets/umls.html).
> > From then
> > on, all other databases should refer to biological
> > enitites through RDF
> > references to these entities. UMLS already contains
> > nearly a million
> > concepts and about 134 semantic relations, so
> > conversion into RDF/OWL
> > should be fairly automatic (e.g., namespace umls:).
> >  
> > Fifth, once an interesting set of observations can
> > be related to
> > existing published data, the researchers should be
> > able to propose the
> > new relations and/or bio-mechanisms using RDF. The
> > gathered facts and
> > assumptions should be sufficient for any other
> > scientist to validate the
> > proposed hypothesis based on the presented RDF
> > material for themselves.
> > This is an important requirement for systems biology research, since

> > describing and sharing complex relations and hypothetical mechanisms

> > is at the heart of elucidating a biosystem. A
> > distributed annotation model
> > would also greatly enhance sharing of models and
> > insights. It will be a
> > test of the expressivity RDF can formally define
> > within a field that is
> > as encompassing as systems biology. How well the
> > systems biology
> > community is able to apply RDF in advancing SB
> > research will be the true
> > test of its utility. 
> >  
> > I intend to post more systems biology use cases and
> > a few possible
> > strategies on solving them in the next few weeks. I
> > also welcome any
> > ideas and suggestions from the life science
> > community in what to
> > consider and develop as part of a collaborative
> > effort towards these
> > goals.
> >  
> >  
> > Eric
> >  
> > 
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
> >     Eric K. Neumann PhD 
> >     VP Strategic Informatics, 
> >     Head of Knowledge Research
> > 
> >    Beyond Genomics
> > 
> > 
> >     40 Bear Hill Road 
> >     Waltham, MA 
> >      tel: 781-434-0222 
> >      fax: 781-895-1119 
> >      www.beyondgenomics.com
> > 
> > 
> >  
> >  
> 
> =====
> ==========================
> MANUEL CORPAS
> 
> University of Manchester
> 
> PhD student, Bioinformatics
> 
http://www.manuelcorpas.com

==========================

________________________________________________________________________
Want to chat instantly with your online friends?  Get the FREE Yahoo!
Messenger http://mail.messenger.yahoo.co.uk



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Eric K. Neumann PhD
    VP Strategic Informatics,
    Head of Knowledge Research

   Beyond Genomics
    Drug Discovery through Systems Biology

    40 Bear Hill Road
    Waltham, MA
     tel: 781-434-0222
     fax: 781-895-1119
     www.beyondgenomics.com
Received on Sunday, 9 November 2003 17:00:16 UTC