The use of Named Graphs to enable ontology evolution


Chris Catton chris.catton@zoo.ox.ac.uk and David Shotton david.shotton@zoo.ox.ac.uk

Image Bioinformatics Lab, University of Oxford, Dept. Zoology, South Parks Road, Oxford OX1 3PS, UK

Position: The dangers of static ontologies

The role of an ontology is to facilitate the understanding, sharing, re-use and integration of knowledge through the construction of an explicit domain model, thereby helping to address many of the difficulties currently experienced in managing large distributed on-line information resources.

With the volume of digital data now estimated to be doubling every month, soon the only way to handle much of this new information will be through the presuppositional 'spectacles' of an ontology. Already people look less and less at raw data, and as the volume accumulates few if any of us will have the time or the mental capacity to assimilate the new data, structure them in a meaningful way, and extract information without first processing the data through an ontology or some other similar machine-based organisational aid. This creates a potential 'paradigm trap', first identified by Duncan Davidson (Davidson, 2002). The philosopher Thomas Kuhn first used the term 'paradigm' to define the way we perceive, think about and value the world, based upon a particular vision of reality. He argued that changes to the current paradigm, "together with the controversies that almost always accompany them, are the defining characteristics of scientific revolutions" (Kuhn, 1966). There is a danger that building and using an ontology may fossilize the current paradigm in a particular field of knowledge, so that only information that fits the paradigm is actually ever seen by the user. Such an outcome would not itself halt scientific progress, since incremental knowledge that fits the current paradigm would still accumulate. It could, however, hinder or possibly even prevent the discovery and exploration of new and uncharted territory. Even in less extreme cases it is essential that ontologies can evolve as a field of study develops.

Factors favouring static ontologies

We perceive several inter-related influences that favour the fossilization of current domain paradigms into static ontologies:

  1. Ontology building currently requires a high level of dedication and understanding, and consequently ontologies tend to be built by small communities of dedicated 'monks'. These are usually led by 'abbots', relatively senior domain experts who are likely to be highly committed to encapsulating the dominant paradigm, and who may resist change.

  2. Ontology building is a time-consuming and expensive exercise, and thus substantial logistic problems confront any newcomers wishing to involve themselves in such activities.

  3. Ontology building quite rightly encourages the development of community consensus. There would thus be massive social pressures against anyone wishing to create an alternative ontology for use in an already populated domain.

  4. The first ontology to be created for a particular domain of knowledge may assume a monopolistic position that becomes virtually unassailable, even if it has universally acknowledged weaknesses in its structure.

  5. If a large volume of legacy data has been encoded with a successful ontology, this will make it difficult to introduce change.

  6. Most ontologies currently under development have both good bits and bad bits, and users typically select the bits they want and ignore the rest. They may thus use a subsection of Ontology A to encode publications data, a subsection from Ontology B to encode personnel information, and most of Ontology C to categorise their biological results. As ontologies become widely used it is possible to imagine that the conceptualisation of a domain will be encoded not in a single ontology, but in a mosaic made up of segments from a number of different ontologies. This ability to pick and choose sections from a set of ontologies mitigates against the commonly held view that ontologies will evolve through a competitive 'survival of the fittest'. There is no single ontology that can 'succeed' or 'fail' as a result of competitive selection pressure. For this reason, we believe that ontologies are unlikely to evolve in response to the same market forces that drive the development of applications software.

Possibilities for change within the present system

When and how does the conceptualisation of a domain change? Kuhn argued that "Just because it is a transition between incommensurables, the transition between competing paradigms cannot be made a step at a time, forced by logic and neutral experience" (Kuhn, 1966). This is a view that has wide and influential support, for example from Max Plank who wrote "a new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it" (Planck, 1949).

That viewpoint might be interpreted as speaking against the very possibility of reflecting a paradigm shift within an ontology. However, we believe this not to be the case, as outlined in the following sections. We distinguish two forms of paradigm shift, which we will call evolutionary and revolutionary. In an evolutionary shift, a significant new body of knowledge is appended to an existing conceptualisation, while in a revolutionary shift, the conceptualisation itself is restructured.1

With hindsight, Albert Einstein's relativity theory can be seen as an extreme example of an evolutionary shift. The old paradigm, Newtonian Mechanics, continues fundamentally unchanged. However, new restrictions are placed on it, since the laws once thought to be universal now only hold true for bodies moving at much less than the speed of light. New laws must be added to describe the case where bodies move at close to the speed of light. It is possible to imagine an ontology describing Newtonian Mechanics as a subgraph of an ontology describing General Relativity. In the new paradigm Newtonian mechanics is not wrong, but now applies under a limited set of conditions. There is a fundamental difference between this and the Copernican revolution. The Copernican revolution is an example of a revolutionary paradigm shift. The old laws no longer hold true. The Earth is not the centre of the universe. The old laws must be removed from the ontology, and replaced with a new set of laws to accommodate the new paradigm.

Evolutionary change of an ontology

As an example of how an evolutionary paradigm shift might be represented in an ontology, consider the graphs shown in Figs. 1 to 3. Fig. 1 shows a section of an ontology describing the development of adult mammalian bone marrow and brain, constructed according to the paradigm of twenty five years ago, when the consensus was very clearly that bone marrow developed from mesoderm and brain developed from ectoderm.

Subsequently, it was shown that adult mouse brains contain haemopoietic stem cells. Since it seemed unlikely that adult bone marrow stem cells would cross the blood-brain barrier to enter the adult brain, it was hypothesised that the brain cells were derived from foetal haemopoietic cells that entered the brain tissue before the barrier was established. (Bartlett, 1982). This proposal is reflected within the ontology given in Fig. 2, which is an extension of the graph shown in Fig 1.



Fig 1. Ontology of the dominant paradigm circa 1980. Haemopoietic cells develop only from mesoderm and neural and glial cells develop only from ectoderm. In this and subsequent figures, solid arrows represent owl:allValuesFrom restrictions - e.g. neural cells can develop only from ectoderm - while dashed arrows are owl:someValuesFrom restrictions.

Fig 2. Ontology of the new paradigm post 1982, reflecting the hypothesis that brain haemopoietic stem cells have been derived by the migration of foetal haemopoietic stem cells of mesodermal origin. This challenges the dominant paradigm that brain tissues are derived exclusively from ectoderm.



Fig 3. Ontology of the emerging paradigm post 2000, after it had been shown that adult haemopoietic stem cells of mesodermal origin can migrate into the brain and there develop into neural cells.


More recently, it has been shown by Brazelton et al. (2000) that haemopoietic stem cells from adult bone marrow can develop into neural cells in adult mouse brain. This striking demonstration of the migration potential and developmental plasticity of adult stem cells both challenges the assumption of Bartlett (1982) that haemopoietic stem cells do not cross the adult blood-brain barrier, thereby throwing in doubt Bartlett's conclusion that they must have entered during foetal life, and, more fundamentally, negates the long-held belief that neuronal cells can only develop from embryonic ectoderm. An ontology that reflects these new findings is shown in Fig. 3.

Now imagine that the graph in Fig. 1 is part of a much larger developmental anatomy ontology. What should be the response to papers that challenge the accepted paradigm, given that many challenges to dominant paradigms are subsequently proven to be erroneous in some way? Kuhn would argue that it would be inappropriate to make a change, since ".. by ensuring that the paradigm will not be easily surrendered, resistance guarantees that scientists will not be lightly distracted and that the anomalies that lead to paradigm change will penetrate existing knowledge to the core". But if the ontology is to be employed to assimilate the results in the first place, how can experimental scientists test their hypotheses without changing the ontology?

The ontology change from Fig 1 to Fig 2 does not present a serious problem for the experimental scientist working on a problem that provokes a crisis in the paradigm. She can simply create a new ontology of her own describing the subdomain in question, import the dominant ontology into it and add the appropriate links between the two. If this combined ontology fits the experimental data better, we would expect it to gather support, and eventually to be accepted as the consensus view. The new ontology succeeds by subsuming the old. The change from Fig 2 to Fig 3 creates a more serious problem. The ontology in Fig 2 is no longer a subgraph of Fig 3, since neural cells no longer develop only from foetal neuroepithelium. However, in practical terms, the old ontology may still be required to interpret a mass of legacy data. Furthermore, it is easy to see that although the present system can permit limited expression of evolutionary paradigm change, it is likely to lead to a haphazard collection of ontologies within which some subset of classes and properties embody the old paradigm, while others embody the new world view, with no obvious mechanism for distinguishing between them using currently available OWL constructs.

Requirements for creating evolvable 'living ontologies'

The work of Brazelton et al. generates a revolutionary paradigm shift, where the old ontology no longer holds true The present system does not permit easy organic development of ontologies in a manner that could accommodate such a change. We thus believe that it would be useful to create a mechanism that would enable ontologies both to evolve in a controlled manner by the accretion of new classes and relationships, and to accommodate revolutionary paradigm shifts while remaining to some degree backward compatible.

A key part of building such evolvable 'living ontologies' lies in creating the potential for making clearly defined changes. What is required is a mechanism not only for importing subgraphs from existing external ontologies, but also for replacing subgraphs within the ontology of interest with new versions that reflect the new paradigm, while at the same time marking the original subgraphs in such a manner that they remain available for the (re)interpretation of legacy metadata previously created using them.

Such a mechanism would allow users to state explicitly the differences between the current paradigm and the proposed new conceptualisation in a way that would allow the two conceptualisations to co-exist, avoiding an unmanageable proliferation of separate ontologies. This may theoretically be possible using current OWL constructs such as owl:versionInfo. However, owl:versionInfo can only be applied at the class level or the ontology level. Neither of these alternatives is the appropriate level of granularity to meet the needs discussed here. Rather, to allow such changes to the conceptualisation of a domain, we need to be able explicitly to select a subgraph from an existing OWL ontology, and to name it and define its properties.

A strategy for the required management of subgraphs within ontologies is made possible using Named Graphs (Carroll et al, 2004). Named Graphs are currently being proposed as an alternative to reification in RDF, and as such they address a number of issues associated with adding metadata to data. We propose that Named Graphs be employed to permit the addition of provenance and other metadata to subgraphs within an ontology. Of course, not all changes to an ontology will represent paradigm shifts, and users may need to know whether changes to relationships in an evolving ontology reflect relatively minor choices about the manner in which a domain being modelled, or more fundamental changes to the domain paradigm itself. Named Graphs would also enable such distinctions to be made.

One of the core motivations for the introduction of named graphs into RDF is to provide a framework for proof and trust. It is certainly possible to imagine this framework being easily transferred to OWL ontologies. So, for example, one user might be happy to trust the Gene Ontology completely, while another user, experimenting in an area where the paradigm is in crisis, might elect to trust the Gene Ontology with the exception of a particular subgraph defined by a single class and all its subclasses. This subgraph might eventually be replaced by a new one reflecting a new paradigm.

Conclusion

Scientific progress proceeds in a series of conceptual or technological leaps, followed by periods of consolidation. Before the introduction of ontologies into data analysis, community support for the dominant paradigm tended to restrain change. While such restraint has been acknowledged as performing a useful function, it is a blunt instrument. The introduction of defined ontologies into data analysis could make the introduction of radical change even more difficult to achieve. The solution proposed here does not solve all the problems of viewing data through the presuppositional lens of an ontology. Anomalies may still be missed simply because they do not fit the paradigm - as has always been the case in the practice of science. However by clearly stating the positions of the dominant paradigm and the proposed changes to it, and by allowing users to give trust or confidence ratings to subgraphs, the process of change can be clearly documented. The use of Named Graphs to identify and describe subgraphs within existing ontologies potentially provides a powerful mechanism for clarifying differences between the dominant and emergent paradigms. As Francis Bacon observed "Truth emerges more readily from error than from confusion" (Spedding et al., 1896).

References

Spedding, J. Ellis, R. L. & Heath D. D. (eds.) (1905) The Works of Francis Bacon. G. Routledge & Sons, London.

Bartlett, p. (1982) Pluripotential hemopoietic stem cells in adult mouse brain. Proceedings of the National Academy of Sciences USA, 79: 2722-2725.

Carroll, J. J., Bizer, C., Hayes, P. and Stickler, P. (2004) Named Graphs, Provenance and Trust. Online at http://www.hpl.hp.com/techreports/2004/HPL-2004-57.

Davidson, D. (2002) The Mouse Atlas - an ontology for mapping gene function data to the mouse embryo. Proc. Conf. on Standards and Ontologies for Functional Genomics (SOFG). Hinxton, Cambridge, November 17-20, 2002.

Kuhn, T. S. (1996) The Structure of Scientific Revolutions, 3rd edition, p. vii. University of Chicago Press, Chicago, Illinois.

Planck, M. (1949) Scientific Autobiography and Other Papers. Philosophical Library, New York.

Brazelton, T.R., Rossi, F.M.V., Keshet, G.I., Blau, H.M. (2000) >From marrow to brain: Expression of neuronal phenotypes in adult mice. Science 290: 1775-1779.

1 Kuhn would argue against the distinction we make here. In his view, all paradigm shifts are revolutionary, since they require a fundamentally different view of the world, and the appearance of evolution, when it is present, is simply a post hoc revision of what actually happened during the course of a paradigm shift. We do not intend to challenge this view, but for the purposes of illustration we choose to accept the revisionist position.