Application of RDF/OWL in area of Investigating Biological Network Dynamics and Automated Experimental Loops

 

SWLS Working Paper submission by:

Justin Lancaster

                        jlancasater@hydrojoule.com

HydroJoule, S. P., Lexington, MA

     A challenge in biomedical research is to evaluate results of gene expression experiments in the context of prior knowledge.  One approach is to (a) analyze gene expression data to a first step of a reduced set of seemingly important genes that exhibit correlated behavior, (b) reverse-engineer from these data a probable network or set of networks without regard to prior knowledge, and then (c) attempt to make sense of the experimental result against a backdrop of pathways maps derived from curation and analysis of the biomedical literature.  Another approach is to utilize the reduced set of correlated genes from the gene expression data as a query to a knowledge base that is formed from the literature utilizing a myriad of bioinformatics tools, extracting a network or set of networks from the knowledge base formed in response to the query set.  The former approach offers the advantage, if done well and based on sufficient experimental design, to shed light on unknown unknowns, but suffers from the weakness of high uncertainty owing to uncontrolled variables.  The latter approach is less likely to correct prior ignorance and error, but is more likely to generate a molecular network that sits robustly on an assembly of many prior lab experiments.  An obvious goal is to marry these two approaches, leveraging their strengths and minimizing their weaknesses.  In addition, approaches that can automate the generation of hypotheses and the design of iterative gene expression experiments would benefit the pace of discovery.

     Forward simulation is used in both the above approaches as part of deriving best fits between early guesses at a network and a conclusion about which derived network deserves to be considered more probable.  Simulation of discrete logical cascading steps without concern for time sequence can provide some information about causation sufficient to generate hypotheses, but may provide little information about mechanism details.  Modeling continuous signal changes in expression levels, with explicit treatment of time dynamics, can have a chance of allowing distinction between specific mechanistic pathways, including nonlinear responses and feedbacks. 

     To utilize RDF/OWL features in the effort to merge the above approaches to discover biological function a number of technical problems need to be addressed, including:

1. Time: Creating standards for modeling dynamics and time-based functions and coping with curated pathways that have little or no dynamic information;

2. Spatial context: Considering spatial and system context, in the sense that there are numerous levels of self-organization requiring nested dynamic modeling in the forward simulation of molecular assemblages, cells, tissues, and metabolic systems, to name a few;

3. Fluid interactions: Given that mammalian biology proceeds to a large extent as a function of aqueous chemistry, where concentration, diffusion, pH, redox potential, ionic dissociation, and bulk transport are important to the modeling effort;

4. Energetics: Energy parameters (as well as material balances) can provide important parameters for constraining a dynamic simulation model, including temperature, Gibbs free energy, enthalpy, entropy and other thermodynamics variables as well as energy represented in electrical potential and phosphate exchanges.  Developing ontologies for thermodynamic functions and interrelationships will be useful;

5. Topology and Congruence testing:  Comparing networks topologically is an important method for rapidly bringing experiment-derived networks and literature-derived pathways into focus, highlighting match-ups and inconsistencies.  Deciding on useful standards for carrying topological descriptors forward with reporting of pathway relationships will be important.

6. Scenarios and Experimental Templates:  To allow an intelligent system to propose an experimental design based on a generated hypothesis, a library of experimental approaches and/or scenarios must be available, with a logical structure that has sufficient flexibility for working between genes, proteins and metabolites, yet enough exact specificity to direct a robotic process

 

To address these technical problems specific features should be contemplated in the ongoing development of RDF/OWL/LSID.  Some of the above needs have been approached within the current framework design, but are not yet completely handled.  Other features will require harmonization with SBML, UML and other dynamic modeling standards.  Commitment of resources toward these efforts is being explored and can be discussed at the workshop.