Please refer to the errata for this document, which may include some normative corrections.
Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
It is considered by many that applying knowledge representation languages common
to the Semantic Web, such as RDF and OWL, in Systems and Software Engineering
can achieve significant benefits. This note hence attempts to outline such benefits
and the approaches needed to acheive them from both a Sematic Web and Software
Engineering perspective.
This note is aimed at professional practitioners, tool vendors and academics with an interest in applying Semantic Web technologies in Systems and Software Engineering (SSE) contexts. These may include:
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a public (WORKING DRAFT) Working Group Note produced by the W3C Semantic Web Best Practices Working Group, which is part of the W3C Semantic Web activity.
Discussion of this document is invited on the public mailing list public-swbp-wg@w3.org (public archives).
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Other documents may supersede this document.
Please send comments to either Phil Tetlow, IBM, <philip.tetlow@uk.ibm.com> or Jeff Pan, Manchester University, <pan@cs.man.ac.uk>.
Until recently work on accepted practices in Systems and Software Engineering (SSE) has appeared somewhat disjointed from that breaking ground in the area of formal information representation on the World Wide Web (commonly referred to as the Semantic Web initiative). Yet obvious overlaps between both fields are apparent and many now acknowledge merit in a hybrid approach to IT systems development and deployment, combining Semantic Web technologies and techniques with more established development formalisms and languages like the Unified Modeling Language (UML). This is not only for the betterment of IT systems in general, but also for the future good of the Web, as systems and Web Services containing rich Semantic Web content start to come online.
Throughout the history of computing the concepts of component construction and reuse have undergone a quite remarkable evolution. Over time many different types of software and systems building block have been advocated, demonstrating ever increasing levels of abstraction and encapsulation. In the earliest computer systems, 'functions' were the predominant unit of composition, returning the same results for a given set of inputs every time. However, it was soon realised that this approach was somewhat clumsy, giving way to more substantial aggregation mechanisms such as 'subroutines' and 'libraries' in the 1960s and 1970s. All the same, even these approaches could not effectively manage shared data and concurrency properly, resulting in systems of unnecessary complexity. Consequently, the mid 1980's saw a number of advances in abstract systems thinking, culminated in the introduction of classified components commonly known as 'Objects'.
The construction of objects effectively encapsulates both data and functionality into usable bundles for processing to take place and today Object Oriented (OO), theory and techniques are accepted as mature concepts in most areas of Information Technology. As such they provide a number of acknowledged engineering advantages, including component reuse.
By collecting objects together into meaningful 'artifacts' or 'assets', and artifacts into 'systems' or 'applications', reuse can be achieved at higher levels. Therefore, the terms 'object', 'artifact' and 'asset' are often considered to be interchangeable and also incorporated the idea of pure data as a valid component type. The terms 'system' and 'application', however, generally imply larger computational units, composed of both usable data and functionality.
Even with such advances in representation and composition, engineering systems of any significance is still difficult. For this reason, in all well-established engineering disciplines, modeling a common understanding of domains through a variety of formal and semi-formal notations has proven itself essential to advancing the practice in each such line of work. This has led to large sections of the Software Engineering profession evolving from the concept of constructing models of one form or another as a means to develop, communicate and verify abstract designs in accordance with original requirements. Computer Aided Software Engineering (CASE) and, more recently, Model Driven Architectire (MDA) provide the most prominent examples of this approach. Here models are not only used for design purposes, but associated tools and techniques can be utilised further to generate executable artifact for later use in the Software Lifecycle. Nevertheless there has always been a frustrating paradox present with tooling use in Software Engineering. This arises from the range of modeling techniques available and the breadth of systems requiring design: Engineering nontrivial systems demands rigour and unambiguous statement of concept, yet the more formal the modelling approach chosen, the more abstract the tools needed, often making methods difficult to implement, limiting the freedom of expression available to the engineer and proving a barrier to communication amongst practitioners with lesser experience. For these reasons less formal approaches have seen mainstream commercial acceptance in recent years, with the Unified Modeling Language (UML) currently being the most favoured amongst professionals.
Nevertheless, approaches like the UML are by no means perfect. Although they are capable of capturing highly complex conceptualisations, current versions are far from semantically rich. Furthermore they can be notoriously ambiguous: A standard isolated model from such a language, no matter how perfect, can still be open to gross misinterpretation by those who are not overly familiar with its source problem space. It is true that supporting annotation and documentation can help alleviate such issues, but traditionally this has still involved a separate, literal, verbose and longwinded activity often disjointed from the production of the actual model itself. Furthermore, MDA does not currently support automated consistency checking. What is needed in addition is a way to incorporate unambiguous, rich semantics into the various semi-formal notations underlying methods like the UML.
Fortunately, with the advent of Semantic Web technologies, semantically rich,
rigorous descriptive languages are now available which are much less imposing
than those previously adopted for high-end specification purposes. The
important point to note here is that MDA provides a powerful and proven framework
for SSE, and Semantic Web technologies provide a natural extension to this framework.
Semantic models (often referred to as 'ontologies'), as defined in the context
of the Semantic Web, augment the model paradigm. When all the hype is removed,
modeling in SSE, is simply an annotated method for describing whatever one wants
to achieve. The Semantic Web significantly enriches this approach, itself originating
from a desire to describe concepts properly. What makes Semantic Web technologies
different, however, is that they have been deliberately engineered from the
ground up to address problems with a potentially unlimited numbers of facets
and interpretations. This type of problem is often referred to as being 'open
world' in nature and solutioning such problems has necessitated the use of far
more precise descriptive languages than those commonly found in traditional
IT circles. Here solutioning has typically addressed much more confinable, or
'closed' problems, and has not needed such levels of descriptive rigor in previous
common practice. Nevertheless, this does not mean to say that rigorous description
would not be of significant advantage in SSE. Quite the contrary, ever since
the early work on Formal Methods over twenty years ago, the IT industry has
recognised the value of rigour in descriptive engineering practice, but the
underlying drivers have not been strong enough to see mass take-up. This is
not the case with the Semantic Web and a new perspective on enginerring systems
is starting to gain ground.
In many respects ontologies can be simply considered as rigorous descriptive models in their own right, being akin to existing conceptual modeling techniques like UML class diagrams or Entity Relationship Models (ERM). As such, their purpose is to facilitate mutual understanding between agents, be they human or computerised, and they achieve this through explicit semantic representations using logic-based formalisms. Typically, these formalisms come with executable calculi that allow querying and reasoning support at runtime. This adds a number of advantages, specifically in the areas of:
Hence, given the semantically rich, unambiguous qualities of information embodiment on the Semantic Web, the amenable syntax of Semantic Web languages, and the universality of the Semantic Web's XML ancestory, there appears a compelling argument to combine the semi-formal, model driven techniques of Software Engineering with approaches common to Information Engineering on the Semantic Web. This may involve the implanting of descriptive ontologies directly into systems' design models themselves, the referencing of separate semantic metadata artifacts by such models or a mixture of both. What is important is that mechanisms are made available to enable cross-referencing and checking between design descriptions and related ontologies in a manner that can be easily engineered and maintained for the betterment of systems' quality and cost.
Having raised the idea of using of the Semantic Web in Software Engineering, a commonly asked question arises, namely; how does one broadly characterise the Semantic Web in terms of Systems or Software' Engineering use? In attempting to answer this question, consensus appears to be forming around two loose definitions:
Primarily such tools and techniques should be viewed as being formally descriptive in character, but there appears little reason to restrict this definition other than standards alignment. Therefore, it may also be relevant, at some appropriate point in the Semantic Web's future, to include prescriptive, invasive and/or other types of approach under this heading.
In such circumstances the Semantic Web could be viewed as a single formalised corpus of interrelated, reusable content, which can further be classified as being either:
Given that The Semantic Web uses triple-based data representation and that this is merely a minimalisation of the representation employed in relational database technologies, the attraction of considering the Semantic Web as a specialised relational framework has been recognised for some time. Nevertheless recognising this potential, a significant benefit for SSE is often overlooked - If you can describe something sufficiently well (as is the ultimate aim on the Semantic Web), and that thing exists on the Semantic Web with a similar level of descriptive clarity, the chances of you finding precisely that thing are greatly incresed. So, setting out to describe things better is indirectly rewarded by the increased ability to be [semi-automatically] discovered and to find content that is clearly equivelent or related. To be direct: Rich descriptions empower discoverability.
By suggesting use of the Semantic Web as a system for runtime information and component sharing there is an implicit need to provide means for clearly identifying participating artifacts based on composites of characterising semantic properties (metadata in the form of name-pair/predicate-object values), and this differs from current Semantic Web schemes for unique identification, such as FOAF sha1. In such frameworks the Semantic Web can be seen as a truly global relational assembly of content and, as with every relational model, issues dealing with composite object identification have to be addressed.
Such unique identification schemes should be capable of supporting both the interlinking of broadly related ontologies into grander information corpora (thereby implying formal similarities and relationships between discreet ontologies and/or systems through their classifying metadata), and the transformation of design time component associations into useful runtime bindings. This will, therefore, realise metadata use across a broader spectrum of the Software Lifecycle. In so doing, this approach carries a number of obvious implications for systems employing such techniques:
In this section, we provide some examples to illustrate some ideas mentioned in the previous section.
Application servers provide many functionalities commonly needed in the development of a complex distributed application. So far, the functionalities have mostly been developed and managed with the help of administration tools and corresponding configuration files, recently in XML. Though this constitutes a very flexible way of developing and administrating a distributed application, the disadvantage is that the conceptual model underlying the different configurations is only implicit. Hence, its constituent parts are difficult to retrieve, survey, check for validity and maintain.
To remedy such problems, the Ontology Driven Architecture (ODA) approach can support the development and administration of software components in an application server. The ODA approach is supplementary to MDA, where models abstract from low-level and often platform-specific implementation details. While MDA allows the spearation of conceptual concerns from implementation-specific concerns, currently MDA has not been applied for run-time relevant characteristics of component management, such as which version of an application interface requires which versions of libraries. MDA requires a compilation step preventing changes at runtime which are characteristic for component management. Besides, an MDA itself cannot be queried or reasoned about. Hence, there is no way to ask the system whether some configuration is valid or whether further elements are needed. This is explained by the lack of formal semantics of the notoriously ambiguous UML. In ODA, however, an ontology captures properties of, relationships between and behaviors of the components that are required for development and administration purposes. As the ontology is an explicit conceptual model with formal logic-based semantics, its descriptions of components may be queried, may foresight required actions, e.g. preloading of indirectly required components, or may be checked to avoid inconsistent system configurations - during development as well as during run-time. Thus, the ODA approach retains the original flexibility in configuring and running the application server, but it adds new capabilities for the developer and user of the system.
Figure 1 shows how an ontology-based Application Server could be designed. The left side outlines potential sources, which provide input for the framework. This includes web and application server configuration files, annotated source code, or metadata files. This information is parsed and converted into semantic metadata, i.e. metadata in terms of an according ontology. Thus, this data is now available conforming to a harmonizing conceptual model, weaving together so far separated aspects like security, component dependencies, version or deployment information. The semantic metadata and the ontology are fed into the inference engine which is embedded in the application server itself. The reasoning capability is used by an array of tools at development and at run time. The tools either expose a graphical user interface (e.g. security management) or provide core functionality (e.g. the dynamic component loader) [OESV 2004].
According to [Brown 2004], a formal underpinning for describing models facilitates meaningful integration and transformation among models, and is the basis for automation through tools. Howeve, in existing MDA practice, semi-formal metamodels instead of formal specification languages are used as such formal underpinnings for describing models. An obvious reason is that, unlike UML, the industrial effort for stndardising diagramatic notations, a single dominating formal specification langauge does not exist. Furthermore, different specifaction langauges are designed for different purposes; e.g., B/VDM/Z are designed for modelling data and states, while CSP/CCS/π-calculus are designed for modelling behaviors and interactions.
To solve the problem, we can use ontologies as formal metamodels to describe various formal specification languages; furthermore, the standard Semantic Web ontology language OWL can provide unified syntax. Semantic links among different models are explicitly specified by ontologies and form the basis automation through tools in a Semantic Web-base enviroment. Based on these semantic link, various existing proposals of integrating formal specification languages can be supported in such enviroments.
[Wang 2004] briefly describes such an Semantic Web-based enviroment for semantic link among models, using DAML+OIL (instead of OWL). Examples of semantic links include assertions that Object-Z classes are equivalent to CSP processes and that Object-Z operations are equivalent to CSP events.
Many, however, would argue that such approaches have been tried a number of times before with only limited success, holding up numerous grandiose project attempts at 'Corporate Enterprise Architecture' as classical examples of failure. This may indeed by true, but it is important to remember that past attempts have always been isolated to some degree. Standards-based formal semantic representation targeted at hugely open problem spaces, like the Web, is, however, a new concept and deliberately sets out to remove isolated problem solving from the equation. It not only offers a number of distinct technical advantages, but it is also available to a hitherto unprecedented global development community. Furthermore, this community is steeped in a tradition of free and open knowledge exchange and source distribution. And, if the history of the Web to date is anything to go by, this community will eventually produce a groundswell of support and enough impetus to kick-start a number of revolutionary changes in systems and software engineering as a direct result of the Semantic Web. To recognise this potential and provide early direction is hence considered to be a significantly worthy initiative.
It is acknowledged that the Semantic Web still faces a number of well known issues when attempting to implement public mechanisms for component sharing via semantic metadata association: