Dynamic exchange of biological networks using web services.

(A position paper for W3C Workshop on Semantic Web for Life Sciences, Cambridge, Massachusetts. October 2004.)
Frank Gibbons, Dept. of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, 250 Longwood Ave. SGMB-322, Boston, Massachusetts, USA.
Email: fgibbons@hms.harvard.edu

Introduction

Biological network information is increasingly abundant. The combination of biological networks may be viewed as a multicolor graph, each color representing a different gene-gene or protein-protein relationship, e.g., protein interaction, sequence homology, correlated expression, transcriptional regulation, genetic interaction (sensu synthetic lethality), or metabolic relationship. Relationship types may be further stratified by type of supporting evidence, by directionality or by confidence measure. Furthermore, each organism has its own collection of networks. Although this information's complexity argues for its maintenance by distributed groups, much of its value is derived through network integration.

Application

BioMOBY has established a 'playground' for bioinformatics-based web services, within which we have developed a 'sandbox' called BioGraphNet, which is both a common standard and a collection of services for sharing distributed network information. We serve several network data types, and encourage others to participate, using objects we have registered in BioMOBY's ontology.

BioTrawler, our web-based biological network browser, illustrates the use of BioGraphNet. It dynamically discovers suitable distributed data sources within BioGraphNet, integrates selected sources 'just in time', and visualizes the graph neighboring a user-defined set of genes.

Web services might be used to provide a means for early dissemination of data, rather like the arXiv system developed by Paul Ginsparg (then at Los Alamos) in the early 1990's. Frustrated with the lengthy delays between acceptance and publication of papers in peer-reviewed journals (often on the order of 6-12 months), he started a parallel un-reviewed publication system. Users realize that anyone may post to and retrieve from the archive, and there are no guarantees of quality, but this is outweighed by the speed with which truly high-quality work is made available to the community. Such work will find its way into the peer-reviewed publications at a later point, to become part of the permanent record.

In a similar manner, centralized databases are a tremendous resource to the biological community: a trusted, comprehensive repository of curated information. Yet, there is an unavoidable bottleneck due to limited manpower, resulting in a time-lag between publication of data and its appearance in databases. If barriers to their use can be overcome, semantic web services provide a way for biologists to quickly make their data universally available in a semantically tagged, easily parsable format.

Position

Once the number of available sources goes beyond a handful, some means of filtering becomes necessary. Through the use of meta-data, a user can ask to see (for example) only sources with high availability and low average latency (for fast response), and which originate in a specific (trusted) domain. If a web service were to provide, as part of its meta-data, sample inputs along with the corresponding outputs, the service could be tested regularly and statistics generated about its availability. Just as we use filters to reject unwanted email, we may need meta-data to reject unreliable services, otherwise web services may just turn out to be a tower of babel. As the BioMOBY group has found, the use of ontologies to organize the objects passed around by services turns out to be one of the most important features in promoting interoperability. In an open system, we may need several means of filtering available sources of information.

In order to realize this vision of biologists making data routinely available via web services, it should require little more than access to a web server and the ability to run Perl scripts. Currently, the entry barrier to using web services is quite high, though it is our hope that the BioGraphNet project can lower it significantly. Just like the World-Wide Web itself, the notion of supplying biological network information as semantic services can really only take off with sufficient 'buy-in', or critical mass.

A further application of semantic web services goes beyond the sharing of static information to the domain of sharing algorithms. This need not be as sophisticated as workflow development systems like myGrid or Taverna, but more along the lines of RPC: one machine hands off semantically tagged data to another for computation, and gets back a semantically tagged result. Allied with ontologies, it could be a quite powerful way to share algorithms, without the need to re-write or re-compile in other languages. One application currently being developed in this manner is the ProNet algorithm for probabilistically computing protein complexes from a sub-complex, based on experimental data with high error rates. ( S Asthana, OD King, FD Gibbons, FP Roth. Predicting protein complex membership using probabilistic network reliability. Genome Res., 14, 1170-1175 (2004))

A larger, more generic problem is that of how best to enable biologists to construct their own semantic queries. The BioTrawler application described above endeavours to provide a simple graphical interface to available services, using an ontology for molecular interaction (Proteomic Standards Initiative - Molecular Interaction) applied to the objects provided by services. But this addresses the problem only within this specific application domain. The difficult problem of constructing queries in a user-friendly way remains, and it is hard to see how semantic search can really take off until it has been solved.

Conclusion

The capabilities of Semantic Web Services offer tremendously exciting possibilities for dealing with the deluge of information currently being produced in genomics and proteomics. Significant hurdles remain, but we are getting the first glimpses of what might be possible.