Please refer to the errata for this document, which may include some normative corrections.
Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a public (WORKING DRAFT) Working Group Note produced by the W3C Semantic Web Best Practices Working Group, which is part of the W3C Semantic Web activity.
Discussion of this document is invited on the public mailing list public-swbp-wg@w3.org (public archives).
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Other documents may supersede this document.
Please send comments to either Phil Tetlow, IBM, <philip.tetlow@uk.ibm.com> or Jeff Pan, Manchester University, <pan@cs.man.ac.uk>.
Until recently work on accepted practices in Systems and Software Engineering has appeared somewhat disjointed from that breaking ground in the area of formal information representation on the World Wide Web (commonly referred to as the Semantic Web initiative). Yet obvious overlaps between both fields are apparent and many now acknowledge merit in a hybrid approach to IT systems development and deployment, combining Semantic Web technologies and techniques with more established development formalisms and languages like the Unified Modeling Language (UML). This is not only for the betterment of IT systems in general, but also for the future good of the Web, as systems and Web Services containing rich Semantic Web content start to come online.
Throughout the history of computing the concepts of component construction and reuse have undergone a quite remarkable evolution. Over time many different types of software and systems building block have been advocated, demonstrating ever increasing levels of abstraction and encapsulation. In the earliest computer systems, 'functions' were the predominant unit of composition, returning the same results for a given set of inputs every time. However, it was soon realised that this approach was somewhat clumsy, giving way to more substantial aggregation mechanisms such as 'subroutines' and 'libraries' in the 1960s and 1970s. All the same, even these approaches could not effectively manage shared data and concurrency properly, resulting in systems of unnecessary complexity. Consequently, the mid 1980's saw a number of advances in abstract systems thinking, culminated in the introduction of classified components commonly known as 'Objects'.
The construction of objects effectively encapsulates both data and functionality into usable bundles for processing to take place and today Object Oriented (OO), theory and techniques are accepted as mature concepts in most areas of Information Technology. As such they provide a number of acknowledged engineering advantages, including component reuse.
By collecting objects together into meaningful 'artifacts' or 'assets', and artifacts into 'systems' or 'applications', reuse can be achieved at higher levels. Therefore, the terms 'object', 'artifact' and 'asset' are often considered to be interchangeable and also incorporated the idea of pure data as a valid component type. The terms 'system' and 'application', however, generally imply larger computational units, composed of both usable data and functionality.
Even with such advances in representation and composition, engineering systems of any significance is still difficult. For this reason, in all well-established engineering disciplines, modeling a common understanding of domains through a variety of formal and semi-formal notations has proven itself essential to advancing the practice in each such line of work. This has led to large sections of the Software Engineering profession evolving from the concept of constructing models of one form or another as a means to develop, communicate and verify abstract designs in accordance with original requirements. Computer Aided Software Engineering (CASE) and, more recently, Model Driven Architectire (MDA) provide the most prominent examples of this approach. Here models are not only used for design purposes, but associated tools and techniques can be utilised further to generate executable artifact for later use in the Software Lifecycle. Nevertheless there has always been a frustrating paradox present with tooling use in Software Engineering. This arises from the range of modeling techniques available and the breadth of systems requiring design: Engineering nontrivial systems demands rigour and unambiguous statement of concept, yet the more formal the modelling approach chosen, the more abstract the tools needed, often making methods difficult to implement, limiting the freedom of expression available to the engineer and proving a barrier to communication amongst practitioners with lesser experience. For these reasons less formal approaches have seen mainstream commercial acceptance in recent years, with the Unified Modeling Language (UML) currently being the most favoured amongst professionals.
Nevertheless, approaches like the UML are by no means perfect. Although they are capable of capturing highly complex conceptualisations, current versions are far from semantically rich. Furthermore they can be notoriously ambiguous: A standard isolated model from such a language, no matter how perfect, can still be open to gross misinterpretation by those who are not overly familiar with its source problem space. It is true that supporting annotation and documentation can help alleviate such issues, but traditionally this has still involved a separate, literal, verbose and longwinded activity often disjointed from the production of the actual model itself. Furthermore, MDA does not currently support automated consistency checking. What is needed in addition is a way to incorporate unambiguous, rich semantics into the various semi-formal notations underlying methods like the UML.
Fortunately, with the advent of Semantic Web technologies, semantically rich formal languages are now available which are much less syntactically abstract and imposing than those previously adopted for high-end specification purposes. Therefore, it is now possible to construct models with rich and highly rigorous semantics using relatively simplistic predicate constructs and naming vocabularies closely resembling natural language. This compelling combination suggests that highly formal semantic specifications, although still not easy to produce, could be amenable to a much wider range of IT professional and could realistically start to increase the levels of formality prevalent in mainstream IT systems.
In many respects semantic models (often referred to as 'ontologies') can be simply considered as rigorous descriptive models in their own right, being akin to existing conceptual modeling techniques like UML class diagrams or Entity Relationship Models (ERM). As such, their purpose is to facilitate mutual understanding between agents, be they human or computerised, and they achieve this through explicit semantic representations using logic-based formalisms. Typically, these formalisms come with executable calculi that allow querying and reasoning support at runtime. This adds a number of advantages, specifically in the areas of:
Hence, given the semantically rich, unambiguous qualities of information embodiment on the Semantic Web, the amenable syntax of Semantic Web languages, and the universality of the Semantic Web's XML ancestory, there appears a compelling argument to combine the semi-formal, model driven techniques of Software Engineering with approaches common to Information Engineering on the Semantic Web. This may involve the implanting of descriptive ontologies directly into systems' design models themselves, the referencing of separate semantic metadata artifacts by such models or a mixture of both. What is important is that mechanisms are made available to enable cross-referencing and checking between design descriptions and related ontologies in a manner that can be easily engineered and maintained for the betterment of systems' quality and cost.
Having raised the idea of using of the Semantic Web in Software Engineering, a commonly asked question arises, namely; how does one broadly characterise the Semantic Web in terms of Systems or Software' Engineering use? In attempting to answer this question, consensus appears to be forming around two loose definitions:
Primarily such tools and techniques should be viewed as being formally descriptive in character, but there appears little reason to restrict this definition other than standards alignment. Therefore, it may also be relevant, at some appropriate point in the Semantic Web's future, to include prescriptive, invasive and/or other types of approach under this heading.
In such circumstances the Semantic Web could be viewed as a single formalised corpus of interrelated, reusable content, which can further be classified as being either:
Given that The Semantic Web uses triple-based data representation as its primary mechanism for information storage and that this is merely a specialisation of the categorisation scheme employed for organising content in relational database technologies, the attraction of considering the Semantic Web as a specialised relational framework has been recognised for some time. So, by suggesting use of the Semantic Web as a system for runtime information and component sharing there is an implicit need to provide means for clearly identifying participating artifacts based on composites of characterising semantic properties (metadata in the form of name-pair/predicate-object values), and this differs from current Semantic Web schemes for unique identification, such as FOAF sha1. In such frameworks the Semantic Web can be seen as a truly global relational assembly of content and, as with every relational model, issues dealing with composite object identification have to be addressed.
Such unique identification schemes should be capable of supporting both the interlinking of broadly related ontologies into grander information corpora (thereby implying formal similarities and relationships between discreet ontologies and/or systems through their classifying metadata), and the transformation of design time component associations into useful runtime bindings. This will, therefore, realise metadata use across a broader spectrum of the Software Lifecycle. In so doing, this approach carries a number of obvious implications for systems employing such techniques:
Many, however, would argue that such approaches have been tried a number of times before with only limited success, holding up numerous grandiose project attempts at 'Corporate Enterprise Architecture' as classical examples of failure. This may indeed by true, but it is important to remember that past attempts have always been isolated to some degree. Standards-based formal semantic representation targeted at hugely open problem spaces, like the Web, is, however, a new concept and deliberately sets out to remove isolated problem solving from the equation. It not only offers a number of distinct technical advantages, but it is also available to a hitherto unprecedented global development community. Furthermore, this community is steeped in a tradition of free and open knowledge exchange and source distribution. And, if the history of the Web to date is anything to go by, this community will eventually produce a groundswell of support and enough impetus to kick-start a number of revolutionary changes in systems and software engineering as a direct result of the Semantic Web. To recognise this potential and provide early direction is hence considered to be a significantly worthy initiative.
It is acknowledged that the Semantic Web still faces a number of well known issues when attempting to implement public mechanisms for component sharing via semantic metadata association: