Ontology Driven Architectures and Potential Uses of the Semantic Web in Software Engineering

Editors' Draft 21 Feb 2005

This version:: N/A
Latest version:: N/A
Previous version:: N/A
Editors:: Phil Tetlow, IBM, <philip.tetlow@uk.ibm.com>
Jeff Pan, University of Manchester, <pan@cs.man.ac.uk>
Daniel Oberle, Universität Karlsruhe, <oberle@fzi.de>
Evan Wallace, National Institute of Standards and Technology, <ewallace@cme.nist.gov>
Michael Uschold, Boeing, <michael.f.uschold@boeing.com>
Tom Croucher, University of Sunderland, <tom.croucher@sunderland.ac.uk>
Grady Booch, IBM,<gbooch@us.ibm.com>
Chris Welty, IBM, <christopher.welty@us.ibm.com>

Please refer to the errata for this document, which may include some normative corrections.

Abstract

It is considered by many that applying knowledge representation languages common to the Semantic Web, such as RDF and OWL, in Systems and Software Engineering can achieve significant benefits. This note hence attempts to outline such benefits and the approaches needed to acheive them from both a Sematic Web and Software Engineering perspective.

Target Audience

This note is aimed at professional practitioners, tool vendors and academics with an interest in applying Semantic Web technologies in Systems and Software Engineering (SSE) contexts. These may include:

Software Engineers who are interested in the benefits and potentials of Semantic Web technologies
Members of the Semantic Web community
Members of the Model Driven Architecture (MDA) community, who may be intereseted in ontologies as models
Members of the the automated software engineering community, who are interested in formal methods for [partially] automating the development of software systems
Knowledge Representation and Inference researchers interested in practical applications

Objectives

To enthuse practitioners about the potential uses the Semantic Web and Semantic Web technologies in SSE
To outline the benefits of applying Semantic Web technologies in SSE contexts
To encourage collaboration between the Semantic Web and Systems and Software Engineering communities

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a public (WORKING DRAFT) Working Group Note produced by the W3C Semantic Web Best Practices Working Group, which is part of the W3C Semantic Web activity.

Discussion of this document is invited on the public mailing list public-swbp-wg@w3.org (public archives).

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Other documents may supersede this document.

Please send comments to either Phil Tetlow, IBM, <philip.tetlow@uk.ibm.com> or Jeff Pan, Manchester University, <pan@cs.man.ac.uk>.

1. Introduction
2. Background
- 2.1 Composition and Reuse
- 2.2 A Heritage in Model Driven Architecture
3. Proposed Ideas
4. Examples
5. Previous Experience
6. Issues
7. Informative References
8. Acknowledgments
9. References

1. Introduction

Until recently work on accepted practices in Systems and Software Engineering (SSE) has appeared somewhat disjointed from that breaking ground in the area of formal information representation on the World Wide Web (commonly referred to as the Semantic Web initiative). Yet obvious overlaps between both fields are apparent and many now acknowledge merit in a hybrid approach to IT systems development and deployment, combining Semantic Web technologies and techniques with more established development formalisms and languages like the Unified Modeling Language (UML). This is not only for the betterment of IT systems in general, but also for the future good of the Web, as systems and Web Services containing rich Semantic Web content start to come online.

2. Background

2.1 Composition and Reuse

Throughout the history of computing the concepts of component construction and reuse have undergone a quite remarkable evolution. Over time many different types of software and systems building block have been advocated, demonstrating ever increasing levels of abstraction and encapsulation. In the earliest computer systems, 'functions' were the predominant unit of composition, returning the same results for a given set of inputs every time. However, it was soon realised that this approach was somewhat clumsy, giving way to more substantial aggregation mechanisms such as 'subroutines' and 'libraries' in the 1960s and 1970s. All the same, even these approaches could not effectively manage shared data and concurrency properly, resulting in systems of unnecessary complexity. Consequently, the mid 1980's saw a number of advances in abstract systems thinking, culminated in the introduction of classified components commonly known as 'Objects'.

The construction of objects effectively encapsulates both data and functionality into usable bundles for processing to take place and today Object Oriented (OO), theory and techniques are accepted as mature concepts in most areas of Information Technology. As such they provide a number of acknowledged engineering advantages, including component reuse.

By collecting objects together into meaningful 'artifacts' or 'assets', and artifacts into 'systems' or 'applications', reuse can be achieved at higher levels. Therefore, the terms 'object', 'artifact' and 'asset' are often considered to be interchangeable and also incorporated the idea of pure data as a valid component type. The terms 'system' and 'application', however, generally imply larger computational units, composed of both usable data and functionality.

2.2 A Heritage in Model Driven Architecture

Even with such advances in representation and composition, engineering systems of any significance is still difficult. For this reason, in all well-established engineering disciplines, modeling a common understanding of domains through a variety of formal and semi-formal notations has proven itself essential to advancing the practice in each such line of work. This has led to large sections of the Software Engineering profession evolving from the concept of constructing models of one form or another as a means to develop, communicate and verify abstract designs in accordance with original requirements. Computer Aided Software Engineering (CASE) and, more recently, Model Driven Architectire (MDA) provide the most prominent examples of this approach. Here models are not only used for design purposes, but associated tools and techniques can be utilised further to generate executable artifact for later use in the Software Lifecycle. Nevertheless there has always been a frustrating paradox present with tooling use in Software Engineering. This arises from the range of modeling techniques available and the breadth of systems requiring design: Engineering nontrivial systems demands rigour and unambiguous statement of concept, yet the more formal the modelling approach chosen, the more abstract the tools needed, often making methods difficult to implement, limiting the freedom of expression available to the engineer and proving a barrier to communication amongst practitioners with lesser experience. For these reasons less formal approaches have seen mainstream commercial acceptance in recent years, with the Unified Modeling Language (UML) currently being the most favoured amongst professionals.

Nevertheless, approaches like the UML are by no means perfect. Although they are capable of capturing highly complex conceptualisations, current versions are far from semantically rich. Furthermore they can be notoriously ambiguous: A standard isolated model from such a language, no matter how perfect, can still be open to gross misinterpretation by those who are not overly familiar with its source problem space. It is true that supporting annotation and documentation can help alleviate such issues, but traditionally this has still involved a separate, literal, verbose and longwinded activity often disjointed from the production of the actual model itself. Furthermore, MDA does not currently support automated consistency checking. What is needed in addition is a way to incorporate unambiguous, rich semantics into the various semi-formal notations underlying methods like the UML.

Fortunately, with the advent of Semantic Web technologies, semantically rich, rigorous descriptive languages are now available which are much less imposing than those previously adopted for high-end specification purposes. The important point to note here is that MDA provides a powerful and proven framework for SSE, and Semantic Web technologies provide a natural extension to this framework. Semantic models (often referred to as 'ontologies'), as defined in the context of the Semantic Web, augment the model paradigm. When all the hype is removed, modeling in SSE, is simply an annotated method for describing whatever one wants to achieve. The Semantic Web significantly enriches this approach, itself originating from a desire to describe concepts properly. What makes Semantic Web technologies different, however, is that they have been deliberately engineered from the ground up to address problems with a potentially unlimited numbers of facets and interpretations. This type of problem is often referred to as being 'open world' in nature and solutioning such problems has necessitated the use of far more precise descriptive languages than those commonly found in traditional IT circles. Here solutioning has typically addressed much more confinable, or 'closed' problems, and has not needed such levels of descriptive rigor in previous common practice. Nevertheless, this does not mean to say that rigorous description would not be of significant advantage in SSE. Quite the contrary, ever since the early work on Formal Methods over twenty years ago, the IT industry has recognised the value of rigour in descriptive engineering practice, but the underlying drivers have not been strong enough to see mass take-up. This is not the case with the Semantic Web and a new perspective on enginerring systems is starting to gain ground.

3. Proposed Ideas

3.1 Ontologies as Formal Model Specifications and the Incorporation of Such Models in Semi-Formal Languages

In many respects ontologies can be simply considered as rigorous descriptive models in their own right, being akin to existing conceptual modeling techniques like UML class diagrams or Entity Relationship Models (ERM). As such, their purpose is to facilitate mutual understanding between agents, be they human or computerised, and they achieve this through explicit semantic representations using logic-based formalisms. Typically, these formalisms come with executable calculi that allow querying and reasoning support at runtime. This adds a number of advantages, specifically in the areas of:

Quality
- Requirements conformance and consistency checking
- Rigorous typing, categorisatuion and identification
- Ease of formal specification. With the aid of graphical modeling tools (as commonly favoured in MDA), ontologies can be built using much less abstract user interfaces
- Communication of requirement and intent to domain experts and developers through:
  - The ability to capture, relate and manage models of systems and information at multiple levels and from multiple standpoint.

Increases in semantic expresivity through coverage of concpets not embodied in current se tools
Reductions in design ambiguity through

Unified syntax for tooling. Standard Semantic Web languages (such as OWL) can provide unified syntax for existing formal method languages, providing the potential for open and cooperative tooling environments for formal methods ([Wang 2004]).
Facilitation of decoupling between abstraction layers needed for effective modelling

Cost
- Reduce maintenance overhead through increases in consistency
- Increased potential for reuse, substitution and extension via accurate content discovery on the Semantic Web

Hence, given the semantically rich, unambiguous qualities of information embodiment on the Semantic Web, the amenable syntax of Semantic Web languages, and the universality of the Semantic Web's XML ancestory, there appears a compelling argument to combine the semi-formal, model driven techniques of Software Engineering with approaches common to Information Engineering on the Semantic Web. This may involve the implanting of descriptive ontologies directly into systems' design models themselves, the referencing of separate semantic metadata artifacts by such models or a mixture of both. What is important is that mechanisms are made available to enable cross-referencing and checking between design descriptions and related ontologies in a manner that can be easily engineered and maintained for the betterment of systems' quality and cost.

3.2 The Semantic Web in Systems and Software Engineering

Having raised the idea of using of the Semantic Web in Software Engineering, a commonly asked question arises, namely; how does one broadly characterise the Semantic Web in terms of Systems or Software' Engineering use? In attempting to answer this question, consensus appears to be forming around two loose definitions:

As a 'classification', merely to group together related tools and techniques for modeling rigorous semantics during specification and design stages of the Software Lifecycle.
Primarily such tools and techniques should be viewed as being formally descriptive in character, but there appears little reason to restrict this definition other than standards alignment. Therefore, it may also be relevant, at some appropriate point in the Semantic Web's future, to include prescriptive, invasive and/or other types of approach under this heading.
As a 'mechanism' for strongly identifying and sharing artifacts amongst discrete subsystems, systems and systems' design teams both during design and at runtime.
In such circumstances the Semantic Web could be viewed as a single formalised corpus of interrelated, reusable content, which can further be classified as being either:
- Passive (data in the form of):
  - Flat documents and data (HTML, XML etc)
  - Dynamically generated documents and data (via JSP, PHP etc.)
  - Metadata (RDF, OWL etc)
  - Media - pictures video, music etc
- Active (functionality presented as):
  - Web Services
  - Functional components referenced as fragments within passive content (JavaScript, Java applets etc)

3.3 A Corpus of Reusable Content and the Use of Metadata as Relational Data

Given that The Semantic Web uses triple-based data representation and that this is merely a minimalisation of the representation employed in relational database technologies, the attraction of considering the Semantic Web as a specialised relational framework has been recognised for some time. Nevertheless recognising this potential, a significant benefit for SSE is often overlooked - If you can describe something sufficiently well (as is the ultimate aim on the Semantic Web), and that thing exists on the Semantic Web with a similar level of descriptive clarity, the chances of you finding precisely that thing are greatly incresed. So, setting out to describe things better is indirectly rewarded by the increased ability to be [semi-automatically] discovered and to find content that is clearly equivelent or related. To be direct: Rich descriptions empower discoverability.

By suggesting use of the Semantic Web as a system for runtime information and component sharing there is an implicit need to provide means for clearly identifying participating artifacts based on composites of characterising semantic properties (metadata in the form of name-pair/predicate-object values), and this differs from current Semantic Web schemes for unique identification, such as FOAF sha1. In such frameworks the Semantic Web can be seen as a truly global relational assembly of content and, as with every relational model, issues dealing with composite object identification have to be addressed.

Such unique identification schemes should be capable of supporting both the interlinking of broadly related ontologies into grander information corpora (thereby implying formal similarities and relationships between discreet ontologies and/or systems through their classifying metadata), and the transformation of design time component associations into useful runtime bindings. This will, therefore, realise metadata use across a broader spectrum of the Software Lifecycle. In so doing, this approach carries a number of obvious implications for systems employing such techniques:

That Semantic Web technologies could be used to formalise associations between sub-components within a given system.
That the Semantic Web could be used as a framework for design-time data and component sharing, and this includes the concept of design models being considered as valid, sharable artifacts in there own right.
That the Semantic Web could be used as a framework for runtime data and component sharing between discreet and disparate systems.
That new forms of system could be created through the integration of discreet and disparate information and functionality with semantically similar metadata. This appears especially appealing given current advances the areas of Web Services and Service Oriented Architectures. If underlying metadata were used as a basis for parameterised dynamic systems behaviour, there are further intriguing potentials in the areas of Web Service Choreography and autonomic systems.

4. Examples

In this section, we provide some examples to illustrate some ideas mentioned in the previous section.

Example A: Developing and managing software components in an ontology-based Application Server

Application servers provide many functionalities commonly needed in the development of a complex distributed application. So far, the functionalities have mostly been developed and managed with the help of administration tools and corresponding configuration files, recently in XML. Though this constitutes a very flexible way of developing and administrating a distributed application, the disadvantage is that the conceptual model underlying the different configurations is only implicit. Hence, its constituent parts are difficult to retrieve, survey, check for validity and maintain.

To remedy such problems, the Ontology Driven Architecture (ODA) approach can support the development and administration of software components in an application server. The ODA approach is supplementary to MDA, where models abstract from low-level and often platform-specific implementation details. While MDA allows the spearation of conceptual concerns from implementation-specific concerns, currently MDA has not been applied for run-time relevant characteristics of component management, such as which version of an application interface requires which versions of libraries. MDA requires a compilation step preventing changes at runtime which are characteristic for component management. Besides, an MDA itself cannot be queried or reasoned about. Hence, there is no way to ask the system whether some configuration is valid or whether further elements are needed. This is explained by the lack of formal semantics of the notoriously ambiguous UML. In ODA, however, an ontology captures properties of, relationships between and behaviors of the components that are required for development and administration purposes. As the ontology is an explicit conceptual model with formal logic-based semantics, its descriptions of components may be queried, may foresight required actions, e.g. preloading of indirectly required components, or may be checked to avoid inconsistent system configurations - during development as well as during run-time. Thus, the ODA approach retains the original flexibility in configuring and running the application server, but it adds new capabilities for the developer and user of the system.

System architecture of the ontology-based Application Server. Semantic metadata and the ontology are loaded into the inference engine. Value-added services and tools leverage the reasoning capability embedded in the application server.

Figure 1 shows how an ontology-based Application Server could be designed. The left side outlines potential sources, which provide input for the framework. This includes web and application server configuration files, annotated source code, or metadata files. This information is parsed and converted into semantic metadata, i.e. metadata in terms of an according ontology. Thus, this data is now available conforming to a harmonizing conceptual model, weaving together so far separated aspects like security, component dependencies, version or deployment information. The semantic metadata and the ontology are fed into the inference engine which is embedded in the application server itself. The reasoning capability is used by an array of tools at development and at run time. The tools either expose a graphical user interface (e.g. security management) or provide core functionality (e.g. the dynamic component loader) [OESV 2004].

Example B: A cooperative Semantic Web-based environment for semantic link among models

According to [Brown 2004], a formal underpinning for describing models facilitates meaningful integration and transformation among models, and is the basis for automation through tools. Howeve, in existing MDA practice, semi-formal metamodels instead of formal specification languages are used as such formal underpinnings for describing models. An obvious reason is that, unlike UML, the industrial effort for stndardising diagramatic notations, a single dominating formal specification langauge does not exist. Furthermore, different specifaction langauges are designed for different purposes; e.g., B/VDM/Z are designed for modelling data and states, while CSP/CCS/π-calculus are designed for modelling behaviors and interactions.

To solve the problem, we can use ontologies as formal metamodels to describe various formal specification languages; furthermore, the standard Semantic Web ontology language OWL can provide unified syntax. Semantic links among different models are explicitly specified by ontologies and form the basis automation through tools in a Semantic Web-base enviroment. Based on these semantic link, various existing proposals of integrating formal specification languages can be supported in such enviroments.

[Wang 2004] briefly describes such an Semantic Web-based enviroment for semantic link among models, using DAML+OIL (instead of OWL). Examples of semantic links include assertions that Object-Z classes are equivalent to CSP processes and that Object-Z operations are equivalent to CSP events.

5. Previous Experience

Many, however, would argue that such approaches have been tried a number of times before with only limited success, holding up numerous grandiose project attempts at 'Corporate Enterprise Architecture' as classical examples of failure. This may indeed by true, but it is important to remember that past attempts have always been isolated to some degree. Standards-based formal semantic representation targeted at hugely open problem spaces, like the Web, is, however, a new concept and deliberately sets out to remove isolated problem solving from the equation. It not only offers a number of distinct technical advantages, but it is also available to a hitherto unprecedented global development community. Furthermore, this community is steeped in a tradition of free and open knowledge exchange and source distribution. And, if the history of the Web to date is anything to go by, this community will eventually produce a groundswell of support and enough impetus to kick-start a number of revolutionary changes in systems and software engineering as a direct result of the Semantic Web. To recognise this potential and provide early direction is hence considered to be a significantly worthy initiative.

6. Issues

It is acknowledged that the Semantic Web still faces a number of well known issues when attempting to implement public mechanisms for component sharing via semantic metadata association:

Trust: How does a content consumer know if the provider of any identified content or associated metadata is trustworthy, erroneous or hostile?
Authority: Even if trust can be established, how does a content consumer know if a content provider is allowed provide him with the metadata he needs to accurately determine the relevance of the components being investigated?
Temporality: How does a content consumer know if metadata is accurate relevant at the current point in time

7. Informative References

[Atkinson 2004]: On the Unification of MDA and Web-based Knowledge Representation Technologies. Colin Atkinson. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004).
[CGL 2004]: Issues in Mapping Metamodels in the Ontology Definition Metamodel. Robert M. Colomb, Anna Gerber, Michael Lawley. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004).
[ChKe 2004]: Major Influences on the Design of the ODM. Daniel T. Chang and Elisa Kendall. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004).
[ChKe 2004b]: Metamodels for RDF Schema and OWL. Daniel T. Chang and Elisa Kendall. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004).
[CrGe 2005]: Situation and Identity - A Generalisation of Inverse Functional Properties. Tom Croucher, University of Sunderland and Joe Geldart, University of Durham.
[EmHa 2005]: A Description Logic for Use as the ODM Core. Patrick Emery and Lewis Hart. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004).
[EmHa 2005b]: Including Topic Maps in the Ontology Definition Meta-Model. Patrick Emery and Lewis Hart. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004).
[Guha 2004]: Object Co-identification on the Semantic Web. R. V. Guha, IBM Research, Almaden.
[FHKM 2004]: The Model Driven Semantic Web. David Frankel, Pat Hayes, Elisa Kendall and Deborah McGuinness. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004).
[FHKM 2004b]: Simple Common Logic: A Constraint Language for the ODM . David Frankel, Pat Hayes, Elisa Kendall and Deborah McGuinness. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004).
[GaTi 2004]: An MDA-Based Approach for Facilitating Adoption of Semantic Web Service Technology. Gerald C. Gannod and John T.E. Timm. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004).
[GSNGW 2004]: Model Driven Middleware. Aniruddha Gokhale and Douglas C. Schmidt and Balachandran Natarajan and Jeff Gray and Nanbor Wang. In Q.H. Mahmoud (ed.): Middleware for Communications, Chapter 7, 163-187, Wiley, 2004
[Knublauch 2004]: Ontology Driven Software Development in the Context of the Semantic Web: An Example, Scenario with Protégé/OWL. Holger Knublauch. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004) Enabling Knowledge Representation and MDA® Technologies to Work Together.
[MSUW 2004]: MDA Distilled - Principles of Model-Driven Architectures. Stephan J. Mellor, Kendall Scott, Axel Uhl, Dirk Weese. Addison-Wesley, 2004.
[OESV 2004]: Developing and Managing Software Components in an Ontology-based Application Server. Daniel Oberle, Andreas Eberhart, Steffen Staab, Raphael Volz. In Hans-Arno Jacobsen, Middleware 2004, ACM/IFIP/USENIX 5th International Middleware Conference, Toronto, Ontario, Canada, volume 3231 of LNCS, pp. 459-478. Springer, 2004.
[Oberle 2004]: Semantic Management of Middleware. In Proceedings of the 1st International Doctoral Symposium on Middleware, Toronto, Ontario, Canada, pp. 299 - 303. ACM Press, October 2004.
[PeSc 2003]: Curing the Web's Identity Crisis. S. Pepper and S. Schwab. Technical report, Ontopia, 2003.
[Tetlow 2004]: SOA, Glial and the Autonomic Semantic Web Machine - Tools for Handling Complexity? Philip Tetlow, IBM, UK.
[Wallace 2004]: Experiences and Issues in Converting Data and Object Modeling Languages to OWL. Evan Wallace. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004).
[XLZMP* 2004]: SemanticWare: An EMF-Compatible RDF Infrastructure. Guo Tong Xie, Shixia Liu, Zhuo Zhang, Li Ma, Yue Pan, Li Zhang, Zhong Su, Li Qin Shen. 1st International Workshop on the Model-Driven Semantic Web (MDSW2004).

7. Acknowledgments

Special thanks are due to Prof Cliff Jones (University of Newcastle) for his kind advice during the preparation of this note.

8. References

[Wang 2004]: Semantic Web and Formal Design Methods. Hai Wang. PhD thesis, National University of Singapore.