Dynamic exchange of biological networks using web services.
(A position paper for W3C Workshop on Semantic Web for Life
Sciences, Cambridge, Massachusetts. October 2004.)
Frank Gibbons, Dept. of Biological Chemistry and Molecular
Pharmacology,
Harvard Medical School, 250 Longwood Ave. SGMB-322, Boston,
Massachusetts, USA.
Email: fgibbons@hms.harvard.edu
Introduction
Biological network information is increasingly abundant. The
combination of biological networks may be viewed as a multicolor
graph, each color representing a different gene-gene or
protein-protein relationship, e.g., protein interaction, sequence
homology, correlated expression, transcriptional regulation, genetic
interaction (sensu synthetic lethality), or metabolic relationship.
Relationship types may be further stratified by type of supporting
evidence, by directionality or by confidence measure. Furthermore,
each organism has its own collection of networks. Although this
information's complexity argues for its maintenance by distributed
groups, much of its value is derived through network integration.
Application
BioMOBY has established a 'playground'
for
bioinformatics-based web services, within which we have developed a
'sandbox' called BioGraphNet,
which is both a common standard and a collection of services for
sharing distributed network information. We serve several network
data types, and encourage others to participate, using objects we have
registered in BioMOBY's ontology.
BioTrawler,
our
web-based biological network browser, illustrates the use of
BioGraphNet. It dynamically discovers suitable distributed data
sources within BioGraphNet, integrates selected sources 'just in
time', and visualizes the graph neighboring a user-defined set of
genes.
Web services might be used to provide a means for early dissemination
of data, rather
like the arXiv system developed by
Paul Ginsparg (then at Los
Alamos) in the early 1990's. Frustrated with the lengthy delays
between acceptance and publication of papers in peer-reviewed journals
(often on the order of 6-12 months), he started a parallel un-reviewed
publication system. Users realize that anyone may post to and retrieve
from the archive,
and there are no guarantees of quality, but this is outweighed by the
speed with which truly high-quality work is made available to the
community. Such work will find its way into the peer-reviewed
publications at a later point, to become part of the permanent
record.
In a similar manner, centralized databases are a tremendous
resource to the biological community: a trusted, comprehensive
repository of curated information. Yet, there is an unavoidable
bottleneck due to limited manpower, resulting in a time-lag between
publication of data and its appearance in databases. If barriers to
their use can be overcome, semantic web services provide a way for
biologists to quickly make their data universally available in a
semantically
tagged, easily parsable format.
Position
Once the number of available sources goes beyond a handful, some means
of filtering becomes necessary. Through the use of meta-data, a user
can ask to see (for example) only sources with high availability and
low average latency (for fast response), and which originate in a
specific (trusted) domain. If a web service were to provide, as part
of its meta-data, sample inputs along with the corresponding outputs,
the service could be tested regularly and statistics generated about
its availability. Just as we use filters to reject unwanted email, we
may need meta-data to reject unreliable services, otherwise web
services may just turn out to be a tower of babel. As the BioMOBY
group has found, the use of ontologies to organize the objects passed
around by services turns out to be one of the most important features
in promoting interoperability. In an open system, we may need several
means of filtering available sources of information.
In order to realize this vision of biologists making data routinely
available via web services, it should require little more than access
to a web server and the ability to run Perl scripts. Currently, the
entry barrier to using web services is quite high, though it is our
hope that the BioGraphNet project can lower it significantly. Just
like the World-Wide Web itself, the notion of supplying biological
network information as semantic services can really only take off with
sufficient 'buy-in', or critical mass.
A further application of semantic web services goes beyond the sharing
of static information to the domain of sharing algorithms. This need
not be as sophisticated as workflow development systems like
myGrid or Taverna, but more along the lines of RPC: one machine hands
off semantically tagged data to another for computation, and gets back
a semantically tagged result. Allied with ontologies, it could be a
quite powerful way to share algorithms, without the need to re-write or
re-compile in other languages. One application currently being
developed in this manner is the ProNet algorithm for probabilistically
computing protein complexes from a sub-complex, based on experimental
data with high error rates. (
S Asthana, OD King, FD Gibbons, FP Roth. Predicting protein
complex membership using probabilistic network reliability.
Genome Res., 14, 1170-1175 (2004))
A larger, more generic problem is that of how best to enable
biologists to construct their own semantic queries. The BioTrawler
application described above endeavours to provide a simple graphical
interface to available services, using an ontology for molecular
interaction (Proteomic
Standards Initiative - Molecular Interaction)
applied to the objects provided by services. But this addresses the
problem only within this specific application domain. The difficult
problem of constructing queries in a user-friendly way
remains, and it is hard to see how semantic search can really take
off until it has been solved.
Conclusion
The capabilities of Semantic Web Services offer tremendously exciting
possibilities for dealing with the deluge of information currently
being produced in genomics and proteomics. Significant hurdles remain,
but we are getting the first glimpses of what might be possible.