- From: Manuel Corpas <protein_bioinformatics@yahoo.co.uk>
- Date: Wed, 5 Nov 2003 10:52:36 +0000 (GMT)
- To: public-semweb-lifesci@w3.org
Hello all: I would like to make some comments in relation to the email Eric K. Neumann sent to the community. What have we learnt in the past from the implementation of standards? There have been great successes and even bigger failures when platforms have been developed. For instance, let's think of the English language, which is now the standard language of communication in international circles. Its main competitor was Esperanto, a language that in essence, was a mosaic of many different languages; an intermediate between many romanic languages, whose aim was to be impartial. Nevertheless, Esperanto failed as a platform and today hardly anyone uses it. One of the main reason was that it did not have a seed group of people who were able to use it. Everyone needed to learn it from scratch. Moreover, this language did not have the right institutions supporting it, probably because they did not targeted specifically a clear audience. In other words, its target audience was arguably too general. What if the United Nations would have used it? Would have given to other potential users some credibility? In a more biological example, the adoption of MIAME (minimun information about microarray experiment) seems that has become a widely used tool for the generation of gene expression data on a genomic scale (Brazma et al, Nat. Gen., volume 29 no. 4 pp 365 - 371, 2001). In this case, the key of its sucess might be that it appeared in a timely fashion and the work and the right institutions are behind this work, giving it support. They had a concrete problem (standar for microarray information representation) and a clear target audience: microarry experimentalists. They managed to raise awareness of this proble, a caught the attention of intitutions such as Nature, Stanford, the European Bioinformatics Institute and many others. The encoding language proposed for descriptions of pathway information in systems biology will certainly require a lot of effort to convince people to use it. If this new standard is to be adopted by publishers, a clear target audience needs to be specifically addressed, and this includes both institutions and users. Not to mention if this was to be the common platform for publishing pathway literature. SBML, for example, has a clear user segment: biochemical similation packages for integration of information. However, unless they pervade other communities for the use of such a language, they might be in danger of not achieving the objective of being adopted as a workbench for exchange of information among biologists. The question I wanted to raise was, do we have the right people giving us the right support for RDF? It might be worth considering if KEGG, AfCS, BioCYC, BIND should be more specifically targeted, giving examples taken from their archives. Thus, intitutions such as Nature, which has supported extensively AfCS, would seriously consider the adoption of RDF. If we are lucky, this could make a difference in the level of awareness among potential users so more people start using our platforms Best, Manuel --- Eric Neumann <ENeumann@BeyondGenomics.com> wrote: > Hello, > > I'm posting a set of examples on how an RDF based > system could assist in > describing and sharing structured information > related to Systems > Biology. There are many examples of using RDF as a > descriptive language, > but I'm interested if it can be used to describe not > only facts, but > assumptions and hypotheses regarding mechanisms of > diseases from a > multi-component perspective, typical to data > analysis from a Systems > Biology point of view. > > First, one needs a mechanism for encoding most > molecular data: proteins, > genes, transcripts, metabolites, interactions. The > genome and its > constituents need to be made accessible through a > descriptive system > that also supports a distributed annotations model > (DAS, > http://biodas.org ), but extensible to all molecular > species and > descriptors. > > Second, there needs to be a robust and extensible > model for accessing > and including pathway data (e.g., KEGG, AfCS, > BioCYC, BIND) and merging > it to sets of annotated molecular data. It is hoped > that efforts such as > the BioPAX Ontology (http://biopax.org) will be the > foundation for > describing any group of pathway information, > independent of source. The > formalized relations between molecules, > interactions, and reactions will > be necessary for bridging the wide variety of > biomolecular phenomena. > Just as important will be the ability to aptly > describe the "context" by > which such phenomena are known to occur (e.g., > tissue, disease, > developmental stages). > > Third, much causal evidence is not yet in pathway > databases, but exists > in millions of articles of unstructured scientific > text (i.e., > publications). These not only need to be referenced, > but the essential > molecular mechanisms they describe need to be > encoded in a semantic > format; for example: > > <The Authors> <propose that>: > > <the termination and modulation of> > > <the JAK/STAT signalling pathway> > > <is mediated by> > > <tyrosine phosphatases> > > <the SOCS (suppressor of cytokine signalling) > feedback inhibitors> and > > <PIAS (protein inhibitor of activated STAT) > proteins> > > > This is a nontrivial exercise, but it is possible in > time, and could be > achieved incrementally in stages until it became > part of the publication > process. This structured evidence would then be > merged/layered on top of > other pathway and mechanistic information, so that > inferences could be > performed on the set. Publishers such as Nature are > already beginning to > explore the use of RDF in publication space. > Personally, I think > text-mining can help us with legacy text, but > publishers in thre future > should require authors to encode the model semantics > along with their > text and figures using some form of wizard tool... > > Fourth, additional biological knowledge regarding > anatomy, physiology, > tissues, and diseases need also be represented in > RDF in order to > describe biological systems. A practical way to > begin this process would > be to translate the National Library of Medicine's > UMLS language into > RDF/OWL > (http://www.nlm.nih.gov/pubs/factsheets/umls.html). > From then > on, all other databases should refer to biological > enitites through RDF > references to these entities. UMLS already contains > nearly a million > concepts and about 134 semantic relations, so > conversion into RDF/OWL > should be fairly automatic (e.g., namespace umls:). > > Fifth, once an interesting set of observations can > be related to > existing published data, the researchers should be > able to propose the > new relations and/or bio-mechanisms using RDF. The > gathered facts and > assumptions should be sufficient for any other > scientist to validate the > proposed hypothesis based on the presented RDF > material for themselves. > This is an important requirement for systems biology > research, since > describing and sharing complex relations and > hypothetical mechanisms is > at the heart of elucidating a biosystem. A > distributed annotation model > would also greatly enhance sharing of models and > insights. It will be a > test of the expressivity RDF can formally define > within a field that is > as encompassing as systems biology. How well the > systems biology > community is able to apply RDF in advancing SB > research will be the true > test of its utility. > > I intend to post more systems biology use cases and > a few possible > strategies on solving them in the next few weeks. I > also welcome any > ideas and suggestions from the life science > community in what to > consider and develop as part of a collaborative > effort towards these > goals. > > > Eric > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Eric K. Neumann PhD > VP Strategic Informatics, > Head of Knowledge Research > > Beyond Genomics > > > 40 Bear Hill Road > Waltham, MA > tel: 781-434-0222 > fax: 781-895-1119 > www.beyondgenomics.com > > > > ===== ========================== MANUEL CORPAS University of Manchester PhD student, Bioinformatics http://www.manuelcorpas.com ========================== ________________________________________________________________________ Want to chat instantly with your online friends? Get the FREE Yahoo! Messenger http://mail.messenger.yahoo.co.uk
Received on Wednesday, 5 November 2003 06:05:56 UTC