Re: Systems Biology Use Case from Manuel Corpas on 2003-11-05 (public-semweb-lifesci@w3.org from November 2003)

From: Manuel Corpas <protein_bioinformatics@yahoo.co.uk>
Date: Wed, 5 Nov 2003 10:52:36 +0000 (GMT)
To: public-semweb-lifesci@w3.org
Message-ID: <20031105105236.46273.qmail@web25108.mail.ukl.yahoo.com>
Hello all:

I would like to make some comments in relation to the
email Eric K. Neumann sent to the community. What have
we learnt in the past from the implementation of
standards? 

There have been great successes and even bigger
failures when platforms have been developed. For
instance, let's think of the English language, which
is now the standard language of communication in
international circles. Its main competitor was
Esperanto, a language that in essence, was a mosaic of
many different languages; an intermediate between many
romanic languages, whose aim was to be impartial.

Nevertheless, Esperanto failed as a platform and today
hardly anyone uses it. One of the main reason was that
it did not have a seed group of people who were able
to use it. Everyone needed to learn it from scratch.
Moreover, this language did not have the right
institutions supporting it, probably because they did
not targeted specifically a clear audience. In other
words, its target audience was arguably too general.
What if the United Nations would have used it? Would
have given to other potential users some credibility?

In a more biological example, the adoption of MIAME
(minimun information about microarray experiment)
seems that has become a widely used tool for the
generation of gene expression data on a genomic scale
(Brazma et al, Nat. Gen., volume 29 no. 4 pp 365 -
371, 2001). In this case, the key of its sucess might
be that it  appeared in a timely fashion and the work
and the right institutions are behind this work,
giving it support. They had a concrete problem
(standar for microarray information representation)
and a clear target audience: microarry
experimentalists. They managed to raise awareness of
this proble, a caught the attention of intitutions
such as Nature, Stanford, the European Bioinformatics
Institute and many others.

The encoding language proposed for descriptions of
pathway information in systems biology will certainly
require a lot of effort to convince people to use it.
If this new standard is to be adopted by publishers, a
clear target audience needs to be specifically
addressed, and this includes both institutions and
users. Not to mention if this was to be the common
platform for publishing pathway literature.

SBML, for example, has a clear user segment:
biochemical similation packages for integration of
information. However, unless they pervade other
communities for the use of such a language, they might
be in danger of not achieving the objective of being
adopted as a workbench for exchange of information
among biologists.

The question I wanted to raise was, do we have the
right people giving us the right support for RDF? It
might be worth considering if KEGG, AfCS, BioCYC, BIND
should be more specifically targeted, giving examples
taken from their archives. Thus, intitutions such as
Nature, which has supported extensively AfCS, would
seriously consider the adoption of RDF.
If we are lucky, this could make a difference in the
level of awareness among potential users so more
people start using our platforms 

Best,

Manuel


 --- Eric Neumann <ENeumann@BeyondGenomics.com> wrote:
> Hello,
>  
> I'm posting a set of examples on how an RDF based
> system could assist in
> describing and sharing structured information
> related to Systems
> Biology. There are many examples of using RDF as a
> descriptive language,
> but I'm interested if it can be used to describe not
> only facts, but
> assumptions and hypotheses regarding mechanisms of
> diseases from a
> multi-component perspective, typical to data
> analysis from a Systems
> Biology point of view.
>  
> First, one needs a mechanism for encoding most
> molecular data: proteins,
> genes, transcripts, metabolites, interactions. The
> genome and its
> constituents need to be made accessible through a
> descriptive system
> that also supports a distributed annotations model
> (DAS,
> http://biodas.org ), but extensible to all molecular
> species and
> descriptors. 
>  
> Second, there needs to be a robust and extensible
> model for accessing
> and including pathway data (e.g., KEGG, AfCS,
> BioCYC, BIND) and merging
> it to sets of annotated molecular data. It is hoped
> that efforts such as
> the BioPAX Ontology (http://biopax.org) will be the
> foundation for
> describing any group of pathway information,
> independent of source. The
> formalized relations between molecules,
> interactions, and reactions will
> be necessary for bridging the wide variety of
> biomolecular phenomena.
> Just as important will be the ability to aptly
> describe the "context" by
> which such phenomena are known to occur (e.g.,
> tissue, disease,
> developmental stages).
>  
> Third, much causal evidence is not yet in pathway
> databases, but exists
> in millions of articles of unstructured scientific
> text (i.e.,
> publications). These not only need to be referenced,
> but the essential
> molecular mechanisms they describe need to be
> encoded in a semantic
> format; for example: 
>  
> <The Authors> <propose that>: 
> 
> <the termination and modulation of>
> 
> <the JAK/STAT signalling pathway>
> 
> <is mediated by> 
> 
> <tyrosine phosphatases> 
> 
> <the SOCS (suppressor of cytokine signalling)
> feedback inhibitors> and 
> 
> <PIAS (protein inhibitor of activated STAT)
> proteins>  
> 
>  
> This is a nontrivial exercise, but it is possible in
> time, and could be
> achieved incrementally in stages until it became
> part of the publication
> process. This structured evidence would then be
> merged/layered on top of
> other pathway and mechanistic information, so that
> inferences could be
> performed on the set. Publishers such as Nature are
> already beginning to
> explore the use of RDF in publication space.
> Personally, I think
> text-mining can help us with legacy text, but
> publishers in thre future
> should require authors to encode the model semantics
> along with their
> text and figures using some form of wizard tool...
>  
> Fourth, additional biological knowledge regarding
> anatomy, physiology,
> tissues, and diseases need also be represented in
> RDF in order to
> describe biological systems. A practical way to
> begin this process would
> be to translate the National Library of Medicine's
> UMLS language into
> RDF/OWL
> (http://www.nlm.nih.gov/pubs/factsheets/umls.html).
> From then
> on, all other databases should refer to biological
> enitites through RDF
> references to these entities. UMLS already contains
> nearly a million
> concepts and about 134 semantic relations, so
> conversion into RDF/OWL
> should be fairly automatic (e.g., namespace umls:). 
>  
> Fifth, once an interesting set of observations can
> be related to
> existing published data, the researchers should be
> able to propose the
> new relations and/or bio-mechanisms using RDF. The
> gathered facts and
> assumptions should be sufficient for any other
> scientist to validate the
> proposed hypothesis based on the presented RDF
> material for themselves.
> This is an important requirement for systems biology
> research, since
> describing and sharing complex relations and
> hypothetical mechanisms is
> at the heart of elucidating a biosystem. A
> distributed annotation model
> would also greatly enhance sharing of models and
> insights. It will be a
> test of the expressivity RDF can formally define
> within a field that is
> as encompassing as systems biology. How well the
> systems biology
> community is able to apply RDF in advancing SB
> research will be the true
> test of its utility. 
>  
> I intend to post more systems biology use cases and
> a few possible
> strategies on solving them in the next few weeks. I
> also welcome any
> ideas and suggestions from the life science
> community in what to
> consider and develop as part of a collaborative
> effort towards these
> goals. 
>  
>  
> Eric
>  
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>     Eric K. Neumann PhD 
>     VP Strategic Informatics, 
>     Head of Knowledge Research 
> 
>    Beyond Genomics 
> 
> 
>     40 Bear Hill Road 
>     Waltham, MA 
>      tel: 781-434-0222 
>      fax: 781-895-1119 
>      www.beyondgenomics.com 
> 
> 
>  
>  

=====
==========================
MANUEL CORPAS

University of Manchester

PhD student, Bioinformatics

http://www.manuelcorpas.com

==========================

________________________________________________________________________
Want to chat instantly with your online friends?  Get the FREE Yahoo!
Messenger http://mail.messenger.yahoo.co.uk
Received on Wednesday, 5 November 2003 06:05:56 UTC