Face-to-Face: General Requirements Reading

Attached below is the final draft of the General Requirements document
for the face-to-face meeting next week. Please read this before the
meeting and determine if there are issues you would like to discuss
during the meeting. In particular, the General Requirements subgroup
welcomes feedback on the completeness of the requirements we have
identified. 

Jeff Heflin
WebOnt General Requirements


This document was prepared by the General Requirements Subgroup, which 
consists of the following individuals:

Jeff Heflin (co-chair)          heflin@cse.lehigh.edu
Deborah McGuinness (co-chair)   dlm@ksl.stanford.edu
Jeremy Carroll                  jeremy_carroll@hp.com
Dan Connolly                    connolly@w3.org
Jos De Roo                      jos.deroo.jd@belgium.agfa.com
Pat Hayes                       phayes@ai.uwf.edu
Ned Smith                       ned.smith@intel.com
Herman ter Horst                herman.ter.horst@philips.com


PURPOSE:
The purpose of this document is to identify requirements that are too 
general to result from any single use case area, cut across all use 
cases areas, or are not directly related to the existing use cases, but 
nonetheless important.


REQUIREMENTS:
The following requirements are recommended by the group.


R1. Shared Ontologies

Ontologies are publicly available and different data sources can commit 
to the same ontology for shared meaning.

SUPPORTED TASKS:
Any use case in which distributed data sources use shared terminology.

JUSTIFICATION:
Interoperability requires agreements on the definitions of terms. 
Ontologies can provide standard sets of terms and formal definitions of 
those terms. Data sources that commit to the same ontology explicitly 
agree to use the same terms with the same meanings.

POSSIBLE APPROACH:
Although DTDs and XML Schemas can be used to define the syntax of a 
language, they cannot provide machine-readable semantic definitions for 
the terms of the language. A web ontology language needs:

1) syntax for defining ontologies
2) syntax for WebOnt documents to commit to one or more ontologies
3) syntax for disambiguating when two or more committed ontologies 
contain the same term (perhaps by specifying an order of preference for 
ontologies)

DAML SUPPORT:
DAML+OIL provides an daml:Ontology element and a number of primitives 
for defining classes and properties. It uses XML namespaces to identify 
which set of terms it is using and a daml:imports statement can be used 
in data documents to effectively commit to the definitions of specific 
ontologies.



R2. Ontology Extension

Ontologies can be extended by other ontologies in order to provide 
additional definitions.

SUPPORTED TASKS:
Any use case in which the providers of data are decentralized.

JUSTIFICATION:
Often, shared ontologies are not sufficient. An organization may find 
that an existing ontology provides 90% of what they need, but the 
remaining 10% is critical. In such cases, the organization should not 
have to create a new ontology from scratch, but instead be able to 
create an ontology which extends an existing ontology and adds any 
desired terms and definitions.

POSSIBLE APPROACH:
Although RDF uses XML namespaces to include names from other schemas, 
there is no discussion of how this relates to the definitions of the 
names. For example, if definitions for a single name occur in three 
different schemas, but a document only includes the namespace for one, 
then it is unclear whether the definition intended by the document is 
the conjunction of the three schemas, or only the schema which the 
namespace includes. To explicitly express which definitions are 
intended, the web ontology language needs syntax for expressing 
ontology extension.

DAML SUPPORT:
daml:imports allows an ontology to include the definitions from another 
ontology.

OPEN ISSUES:
An important issue is determining the precise semantics of the 
extension mechanism. Is it equivalent to including the extended 
ontology in the new document? Does it allow definitions to be refined 
or restricted?  Also, what happens if the extensions generate 
incoherent term descriptions?



R3. Ontology Evolution
Ontologies can be changed over time and data sources can specify which 
version of the ontology they commit to.

SUPPORTED TASKS:
Any use case in which the ontology could potentially change, and in 
particular those in which the owner of the ontology is different from 
the data providers.

JUSTIFICATION:
Since the web is constantly growing and changing, we must expect 
ontologies to change as well. Ontologies may need to change because 
there were errors in prior versions, a new way of modeling the domain 
is preferred, or reality has changed (e.g., the addition of new 
technology). A web ontology language must be able to accommodate 
ontology revision. Note that ontology evolution is different from R2, 
because R2 does not change the original ontology. An important issue of 
revision is whether or not documents that commit to one version of an 
ontology are compatible with those that commit to another. Both 
compatible and incompatible revisions should be allowed, but it should 
be possible to distinguish between the two. Note that it is possible 
for a revision to change the intended meaning of a term without 
changing any axioms, thus determining backwards-compatibility requires 
more than a simple comparison of axioms.

POSSIBLE APPROACH:
Allowing ontologies to change arbitrarily can have undesirable side 
effects in documents that committed to prior versions of the ontology. 
Since these documents are distributed and owned by different parties, 
it is impossible to coordinate an update to the ontology with updates 
to all documents that depend on it. One possible solution is:

1) Each revision of an ontology is a separate document that has a 
unique identity in web space (its own URL).

2) A construct for expressing that an ontology revises a prior version

3) A construct for expressing that a revision is backwards-compatible 
with one or more prior versions

4) A construct for expressing that certain terms are deprecated, and 
thus maintained in a revision simply for compatibility with earlier 
versions. This allows terms to be phased out while retaining 
backwards-compatibility.

5) A way of committing to a specific version number of an ontology or 
the latest version of that ontology.

DAML SUPPORT:
In DAML+OIL, each ontology has its own URL. Each ontology has a 
daml:versionInfo element that contains a string giving information 
about the version it represents. However, there is no specified 
structure for this string, and thus it is of little use for automated 
software that wishes to determine which ontologies are prior versions 
of other ontologies. DAML+OIL does not include features for specifying 
backwards-compatibility or deprecation.



R4. Ontology Interoperability

Different ontologies may model the same concepts in different ways. The 
language should provide primitives for relating different 
representations, thus allowing data to be converted to different 
ontologies, and enabling a "web of ontologies." However, this 
requirement must be balanced with the need for scalability (R6).

SUPPORTED TASKS:
Any use case in which data from different providers with different 
terminologies must be integrated.

JUSTIFICATION:
Although shared ontologies (R1) and ontology extension (R2) allow a 
certain degree of interoperability between different organizations and 
domains, there are often cases where there are multiple ways to model 
the same information. This may be due to differences in the 
perspectives of different organizations, different professions, 
different nationalities, etc. In order for machines to be able to 
integrate information that commits to heterogeneous ontologies, there 
need to be primitives that allow ontologies to map terms to equivalents 
in other ontologies.

POSSIBLE APPROACH:
There are many ways that different ontologies can model the same 
concepts, resulting in different types of heterogeneity. One approach 
is to have the expressivity of first order logic, which can be used to 
define articulation axioms and resolve most of the kinds of 
differences. However, this solution is not in line with the scalability 
requirement. Below is a list of language features that can be used to 
map heterogeneous ontologies; the web ontology language should include 
some (but probably not all of these).

1) subclass/superclass relations
2) inverse relationships
3) equivalence of concepts (for classes, properties, and individuals)
4) logical constructs (implication, conjunction, disjunction)
5) arithmetic functions
6) aggregation (e.g., like SQL GROUP BY)
7) string manipulation
8) procedural attachments (executable code, possibly Java, that can be 
used to define arbitrarily complex mappings)

Another approach may not to be to include the expressivity of first 
order logic but instead reduce the expressive power of the language in 
which the articulation or mapping axioms are expressed.  For example, a 
limited language might be allowed to state that term A in Ontology 1 is 
equivalent to term B in Ontology 2 with the following additional 
restrictions (where those restrictions are not in a language that is as 
expressive as FOL).

DAML SUPPORT:
DAML+OIL contains the rdfs:subClassOf and rdfs:subPropertyOf relations 
for defining taxonomic relations of classes and properties, 
respectively. It also has the daml:equivalentTo family of properties 
(daml:sameClassAs, daml:samePropertyAs, and daml:sameIndividualAs) for 
expressing equivalence of classes, properties, and individuals. There 
is a daml:inverseOf property for defining inverse relationships. 
Finally, the description logic primitives allow for mappings similar to 
some of the logical constructs. However, DAML+OIL does not have 
features for expressing implication, arithmetic functions, aggregation, 
string manipulation, or procedural attachments.



R5. Detect Inconsistency
Different ontologies or data sources may be contradictory. It should be 
possible to detect inconsistencies.

SUPPORTED TASKS:
Any use cases in which decentralization of data and lack of controlling 
authority can lead to conflicts in the data. Any ontology extension 
task that may result in incoherent descriptions (possibly by extending 
an ontology in a way that generated an overconstrained term).

JUSTIFICATION:
The Web is decentralized, allowing any one to say anything. As a 
result, different viewpoints may be contradictory, or even false 
information may be provided. In order to prevent agents from combining 
incompatible data or from taking consistent data and evolving it into 
an inconsistent state, it is important that inconsistencies can be 
detected automatically.

POSSIBLE APPROACH:
First the language must be able to express inconsistent situations. 
This could be done with a negation operator, disjointness of sets, 
cardinality restrictions, etc. Second, a reasoning component must be 
able to detect inconsistencies.  Third, some reporting mechanism must 
be made available to report inconsistencies possibly along with an 
explanation of how the inconsistency was generated.

DAML SUPPORT:
DAML+OIL can express disjoint classes (using daml:disjointWith), 
cardinality restrictions (with daml:cardinality, daml:minCardinality, 
and daml:,maxCardinality), and complements (with daml:complementOf). 
Using a description logic reasoner, inconsistencies within an ontology, 
between a set of ontologies, and between ontologies and data sources 
can be detected.



R6. Scalability
The language should be able to be used with large ontologies and large 
data sets. However, the language must balance this requirement with 
Expressiveness(R10).

SUPPORTED TASKS:
Any use case that uses large ontologies or large data sets.

JUSTIFICATION:
There are over one billion pages on the Web, and the potential 
application of the Semantic Web to embedded devices and agents poses 
even larger amounts of information that must be handled. The web 
ontology language must support systems that can scale to these sizes.

POSSIBLE APPROACH:
Many expressive languages are intractable, resulting in them not being 
scalable. One solution is to restrict the language to features that 
have efficient algorithms for reasoning. Two candidates for limited 
reasoning are description logics and datalog.

DAML SUPPORT:
DAML+OIL is based on  a description logic language for which tractable 
reasoners may be built.



R7. Ease of Use
The language should provide a low-learning barrier and have clear 
concepts and meaning. The concepts should be independent from syntax.

SUPPORTED TASKS:
Markup and querying of semantic web documents by users, either directly 
or indirectly.

JUSTIFICATION:
Although ideally most users will be isolated from the language by front 
end tools, the basic philosophy of the language must be natural and 
easy to learn. Furthermore, early adopters, tool developers, and power 
users may work directly with the syntax, meaning human readable (and 
writable) syntax is desirable.

POSSIBLE APPROACH:
Where possible, use concepts and idioms that are familiar to ordinary 
software engineers and computer scientists. For example, relate ideas 
to object oriented and or relational databases.  One or more 
presentation syntaxes could be provided that are natural to users.

DAML SUPPORT:
DAML+OIL's classes are equivalent to classes in object-oriented 
terminology. DAML+OIL also incorporates features from frame-based 
systems and description logics. However, DAML+OIL's syntax has not 
retained features that have made some of its foundational components 
such as object-oriented systems and frame systems more usable.  As the 
language was generated to be compatible with the RDF framework, 
arguably some of the useful syntax features present in other languages 
were not preserved.



R8. XML syntax
The language should have an XML serialization.

SUPPORTED TASKS:
Exchange of ontologies and data in a standard format.

JUSTIFICATION:
XML has become widely accepted by industry and numerous tools for 
processing XML have been developed. If the web ontology language has an 
XML syntax, then these tools can be extended an reused.

POSSIBLE APPROACH:
Provide an XML syntax for the language.

DAML SUPPORT:
DAML+OIL extends RDF and RDF-Schema, which in turn have XML 
serializations.

OPEN ISSUES:
There is lack of consensus as to whether the language should also build 
on RDF and RDF Schema. The arguments for building on RDF are that it is 
a W3C Recommendation and there exists software for parsing it. 
Arguments against RDF include that it does not have the widespread 
acceptance of XML, and trying to fit DAML+OIL into it has occasionally 
resulted in awkward language constructs.



R9. Ontology-based search
Conceptual search or semantic search -- search exploiting the meaning 
of terms instead of just the syntax of the search terms.

SUPPORTED TASKS:
Any 'find' capability that attempts to exploit term meaning including 
synonyms, subclass/superclass relationships, relationships between 
terms, or parametric search.

JUSTIFICATION:
It is widely recognized that simple statistical information retrieval 
techniques for search have limitations.  If for example, a user is 
looking for a car when a web site contains only the word automobile, 
then no matches will be found even though information exists that would 
be relevant to the user's query.  Market research shows that search is 
one of the most important functionalities on web sites ([1] ) and also 
that sophisticated search is imperative for e-commerce sites ([2]).  
Also, as objects become complicated and compositional, it becomes more 
important to be able to search for relationships between terms (such as 
the "capital of France" and not just retrieve all documents about 
France) or "red cashmere sweater with a price under 150 dollars." The 
latter is typically called parametric search where parameters on the 
class sweater are restricted.  Parametric search, relational search, 
and standard conceptual search all rely on term meaning and term 
composition of search terms and also on the meaning of the terms in the 
document to be retrieved.

POSSIBLE APPROACH:
Use background ontologies to provide:
1) query expansion using synonyms and subclasses (and possibly more) 
from the background ontology
2) understanding of term relationships, e.g., capitals of countries are 
cities that are located geographically inside the country.
3) identify parameters appropriate to classes and their value 
restrictions

DAML SUPPORT:
Background ontologies can be expressed in DAML+OIL.  Also, queries can 
be formed from DAML expressions.



R10. Expressiveness
The language should be as expressive as possible, given a balance with 
requirement R6, Scalability. Expressivity determines what can be said 
in the language, and thus determines its inferential power and what 
reasoning capabilities should be expected in systems that fully 
implement it.

SUPPORTED TASKS:
Any use case that requires the representation of diverse knowledge.

JUSTIFICATION:
The degree of semantics that can be expressed by a language depend on 
the primitives that it provides. An expressive language contains a rich 
set of primitives that allow a wide variety of knowledge to be 
formalized. A language with too little expressivity will provide too 
few reasoning opportunities to be of much use and may not provide any 
contribution over existing languages.

POSSIBLE APPROACH:
One possible approach is to base the language on first order logic, but 
this conflicts with R6, Scalability. Therefore, more restricted, yet 
more scalable alternatives should be considered. Two such alternatives 
are description logics and datalog.

DAML SUPPORT:
DAML+OIL is based on an description logic that has had its constructors 
chosen to maintain tractability of reasoning.



APPENDIX: OTHER DESIRABLE FEATURES

The following items were discussed as candidate requirements but we 
were unable to reach consensus as to whether they were actual 
requirements for WebOnt.


C1. Explainability
 An ontology language and its environment may provide information that 
is believed to be true. A system should be able to justify its (true 
and false) beliefs.
 
SUPPORTED TASKS
 
A justification should be available for any statement.  A statement 
such as A is a subclass of B may be justified by a simple reference to 
where the statement was asserted  (such as told in ontology I) or a 
more complex statement (such as ?A is stated to be a subclass of C in 
Ontology I and C is stated to be a subclass of B in Ontology I (and the 
subclass relationship is transitive?)). While explanations of all 
beliefs/deductions are important, some of the most important 
explanations are those that explain contradictions.  Additional support 
may be provided for this.  For example, if ontology I contains:
   A is disjoint from B
   C is the intersection of A and B
Then it is deducible that C contains no instances and is incoherent.
 
JUSTIFICATION:
Explanation has been shown to be critical in user acceptance of 
systems[3].  If a user is expected to accept an answer from a system, 
the system should be able to be asked to justify its conclusions.  
Similarly if a knowledge base is expected to be evolved, it is 
imperative for the person modifying the system to be able to query the 
system about consequences of anticipated and recent updates.  Many 
applications like customer relation management, help desk, 
configuration, and expert systems require explanation support in order 
to be used.
 
POSSIBLE APPROACH:
An extensive explanation facility has been designed and implemented for 
some description logic-based systems utilizing inference rule encodings 
and pruned, incremental presentations of inference rule applications 
[4,5]. Simplified versions of this can be done that explain common 
types of inference (but ignore more atypical and more complicated 
inferences) in order to get a quickly implementable and simpler 
explanation capability into a system. At a minimum, a system should be 
able to point to the source(s) of told information.
 
DAML SUPPORT:
A listing of the inference rules for DAML+OIL is available in the 
axiomatic semantics document. [6]. 
 
CONCERN:
Explainability may be more applicable to the "proof layer" of the 
Semantic Web than it is to WebOnt. Furthermore, it may have little 
impact on language design (although it is likely to have significant 
impact on application design).


C2. Internationalization
The language should support ontologies in multiple languages.

CONCERNS:
This seems to be covered by the interoperability requirement. 
Furthermore, character set issues are already handled by XML.  We 
should embrace whatever internationalization solutions are available 
from our building blocks of XML and RDF.
 

C3. Ontology Querying
There are a few possible definitions:
    -  Ability to ask questions about the logical structure of the 
ontology.
    -  Ability to ask questions about the information that follows from 
an ontology.
    -  A full fledged query language

CONCERNS:
This topic needs further clarification.  There is some overlap with 
ontology-based search that supports parametric search.  There is also 
some overlap with explainability. A full description of this topic 
would include all of the work on query languages for ontologies as well 
as including distinctions from ontology-based search and explanation.
 

C4. Tagging
It is sometimes useful to attach additional information to a piece of 
data. For example, the source of the data, a time stamp, or a 
confidence level. Tagging is the ability associate such information.

CONCERNS:
This requirement should probably be motivated by specific examples of 
needs for tagging.


C5. Proof Checking
Proofs can be described in a language and will be checkable. This might 
be considered a further step to C1: Explainability - which just 
explains its beliefs but does not include the requirement of a kind of 
automatic proof checking.

CONCERNS:
As with Explainability, this may be more appropriate for the "proof 
layer."
 

C6. Security 
Ability to specify who can view and modify information.  Ontologies 
should be able to specify access control information.

CONCERNS:
The Web typically doesn't allow update (except via file update) and 
viewing web pages is typically all or nothing.  Filtered viewing is not 
common.


C7. Trust
How to determine which information is reliable and/or believable. Must 
be able to know the sources of information and to express what 
supporting information is needed to believe something.

CONCERNS:
This is a larger issue which probably belongs in the "proof layer." We 
can probably reach consensus on the requirement for annotation of 
authoritative sources vs. other sources of information.
 

C8. Data Persistence
The Web is constantly changing, so it would be useful to know the 
lifetime of information. This will be useful for agents to know when 
they must refresh their knowledge bases.
 
CONCERNS:
If applied at the document level, this may be handled by the HTTP 
expires header. However, we may consider specifying this per fact in a 
data source, or for certain properties in an ontology.
 


BIBLIOGRAPHY:
[1] Laura Koetzle, Paul Hagen, Hillary Drohan, and Moira Dorsey.  
"Smarter Sales of Complex Goods".   The Forrester Report. Cambridge, 
Mass. September 2001.

[2] Paul Hagen, David Weisman, Harley Manning, and Randy Souza.  
"Guided Search
for eCommerce ".  The Forrester Report.  Cambridge, Mass. January, 1999.

[3] Deborah L. McGuinness and Jon Wright. ``An Industrial Strength 
Description Logic-based Configurator Platform''. IEEE Intelligent 
Systems, Vol. 13, No. 4, July/August 1998, pp. 69-77. 
 
[4] Deborah L. McGuinness. ``Explaining Reasoning in Description 
Logics''. Ph.D. Thesis, Rutgers University, 1996. Technical Report 
LCSR-TR-277. abstract and available from Rutgers Department of Computer 
Science Technical Report Series,.
 
[5] Alex Borgida, Enrico Franconi, Ian Horrocks, Deborah L. McGuinness, 
and Peter F. Patel-Schneider. ``Explaining ALC subsumption'' 
Proceedings of the International Workshop on Description Logics - 
DL-99, pp 33-36, Linköping, Sweden, July 1999. 
 
[6] 
http://www.ksl.stanford.edu/people/dlm/daml-semantics/abstract-axiomatic
-semantics.html 
 
[7] http://www.research.att.com/~dlm/papers/fois98-abstract.html 

Received on Monday, 7 January 2002 13:25:35 UTC