OWL Use Cases: Collection Management

Status: document for face-to-face meeting on Jan 14-15

Version: January 7, 2002

Editor

Guus Schreiber, IBROW / University of Amsterdam, schreiber@swi.psy.uva.nl

Members

Stephen Buswell, Stilo Technology, StephenB@stilo.com
Nicholas Gibbins, University of Southampton, nmg@ecs.soton.ac.uk
Guus Schreiber, IBROW / University of Amsterdam, schreiber@swi.psy.uva.nl
Michael K. Smith, Electronic Data System (EDS), michael.smith@eds.com

Document Purpose and Overview

This document characterizes the collection-management area and describes five possible use cases for the area of "collection management". The use cases are described in a standard format. Emphasis is placed on concrete examples of knowledge and information that would need to be represented in OWL (the Ontology Web Language).

This document should serve as input for the face-to-face meeting of the W3C Web Ontology Working Group on 14-15 January 2002 at Bell Labs in Murray Hill, NJ

For readability purposes the document starts with a summary of the requirements arising from the use cases.

Summary of Resulting OWL Requirements

The numbers refer to the use cases below.

Expressivity requirements

Class hierarchy with properties/attributes and relations (all)
Explicit distinction between attributes (pointing to datatype values) and general relations,
Default knowledge (1, 2, 4)
- living things don't fly
- Late Georgian chests of drawers are typically made of mahogany wood
Part-whole relation (2, 3, 4)
- a wing spar is part of a wing assembly
- chests of drawers have feet with their own style
Constraints (2, 3, 4)
- wing-spar.length < wing.length
- furniture.style = Late-Georgian <=> furniture.culture = British AND furniture.date-created 1760-1811
- Note also the comments about the DAML+OIL solution to the latter example in use case 4.
Instance specification / instances as classes (1, 3, 5)
- Mammal is an instance of Species but also a class in its own right.
- A380 is an instance of Aircraft, but also denotes a set of A380 instances.
- Cf. Protege-2000's metclass notion. Note: this is a feature of RDF.
- Anonymous instance specification through property values.
Mechanism for relation typing (5)
Also an RDF feature (you can define an rdf:type for a property.
Abstract classes (4)
Classes without instances: see the color hierarchy example in use case 4.
Synonyms / lexical term -> concept (1, 5)
Need to distinguish lexical term from concept it denotes. [This may not be an OWL issue.]

Other requirements

Provenance (1, 5)
Typical examples: distinguishing annotations by experts and non-experts, authorship of hyperlinks.
Version management (1)
Ability to extend/revise ontology.
Query support (1)
Ability to reason with falsehood, e.g. "whales as fish".
Support for content standards (1, 3)
General thesauri (WordNet, TGN) as well as domain-specific thesauri (AAT, ICONCLASS) are often used to standardize annotations.

Use-Case Area Scope and Definition

This use-case area typically has the following characteristics:

Large data/text/image/multimedia/website sets with a common theme/context/focus
Relatively fixed set of items in archive/collection
The collection can be very large set, so scalability issues typically come into play.
Collection management is typically domain specific, therefore linked to (traditional) work on domain standards.
The focus is on metadata, therefore a natural link traditional metadata efforts.

Collection-management typically has the following subtasks:

item indexing/annotation/classification
collection updates
collection search
often involves default reasoning

Links to Other Areas/Issues/Tasks

Virtual catalogs

Examples:

virtual museum (several projects)
product search/comparison sites (e.g., Lynn Stein's book identification, Mike Dean's hotels)

There is a clear link here to the "interoperability" area. Virtual catalogs typically requires ontology-mapping stuff. Also, it makes the collection management task different as less assumptions can be made about the collection (e.g., its size).

Service catalogs

These are mentioned in a number of use cases. With respect to the declarative aspects of service description and search, there is a clear link between "web services" and this area.

Presentation generation

Semantically annotated catalogs are an ideal substrate for (context-specific) generation of presentations c.q. web pages. Example: dynamic configuration of a web page for browsers of an art catalog, showing related texts and images.

Conceptual search

In conceptual search we would like to view the whole web as one indexed catalog. This seems to be a bridge too far at the moment, given the problems we still have a domain-specific catalogs. A realistic scenario for the short-term conceptual search is a two-step process:

Use an Open-Directory like mechanism to constrain your search to an area which hopefully provides some archives/catalogs.
Use the semantic search engines of the catalogs to find an answer to your query.

Content standards

Due to the domain specificity of catalogs, many of them require a clear link with domain standards/vocabularies (existing or under development). These domain standards were typically developed to support manual indexing. Also, more general resources such as WordNet are being used.

Use Case 1: "Arkive: catalog of endangered species descriptions"

Contributor: Jeremy Carrol

Context

The Arkive project is creating a multimedia database consisting of a record for each endangered species. The database aims at completeness, with enough appropriate information for each species. The database is accessed through a web site and targeted at users at all levels of expertise: ranging from school children through to domain expert. The key functions of ontological knowledge are:

to allow consistent organization of each species record
to provide a means for ensuring that each species record is sufficiently detailed, and includes examples of each important behavior.
to help with query across the database

Other functions where ontological knowledge maybe useful include organizing annotations and provenance of knowledge. We note that:

despite the relevant science having had about two centuries of debate there is no universal agreement about appropriate ontologies for full and adequate species descriptions.
the number of species suggests that globally a federated solution is needed. The British participants have funding to make records of all British species, and the top N globally endangered species. The long-term plan would be to have people world-wide contributing records for their local species. This is likely to exacerbate the lack of agreement about the underlying ontologies.

Task

Organizing, commissioning and querying a database of multimedia records.

Example domain

Multimedia records of endangered species.

Typical users

scientist making a specific record.
manager commissioning new records.
scientist querying DB through web-site
school child querying DB through web-site

Ontology samples

Currently they use about ten master record-templates for the different top-level categories. For example, there is typically no "locomotion" field for plants, but it is of interest for animals.

These top-level categories are necessarily insufficient in that they cover (only) the general types of behavior. Any unique or rare behavior of a species is:

important to include in the record
not in the top-level category

Also, such behaviors are subject to scientific debate. A concrete example was to do with birds that pick up poisonous insects in their beaks and rub them against their feathers. It is contentious whether they do this:

to get high, or
to kill off parasites in their feathers

The name you use for the behavior depends on your judgment on its motivation; which may well depend on your political persuasion.

There are also some behaviors which have multiple different names that are synonymous. Default inheritance is important. The well known penguins issue:

   living things don't fly
   birds         do    fly
   penguins      don't fly

This can be addressed when first creating a record, when default values can be filled in, to be changed if necessary, or more dynamically.

It is important to relate the category information back to multiple (partially inconsistent) taxonomies in the field.

OWL requirements

Hard to say - there are a range of knowledge base requirements, which ones actually belong to the ontological subsystem is problematic.

Hierarchical classes with inheritance of properties, default values, etc. Probably single inheritance would suffice.
Provedance: to distinguish facts that are in the specific record, from later annotations by experts or non-experts, from inherited facts etc.
Query support. Query may be guided by category information, and possibly by falsehoods (e.g. "whales are fish" may be useful to help small children search, who might otherwise conclude there are no whales in the DB). Mixed mode query - both free text and category information.
Multiple synonymous labels for properties and values.
Thesaural support.
Ability to extend ontology on the fly, in a distributed fashion. (Experts adding framework to describe the special behavior of their species).

Use Case 2: "EDS web page landfill"

Contributor: Mike Smith

Context

Support for corporate communication and corporate memory.

Task

Organizing a massive web page land-fill into hierarchical categories

Example domain

External press releases, product offerings and case studies, corporate procedures, internal product briefings and comparisons, white papers, and offering process descriptions.

Typical users

Salesperson looking for sales collateral relevant to a client's expressed interest.
Technical person looking for pockets of specific technical expertise and detailed past experience.

Ontology samples

Document type hierarchy:

Press release
  press release covering financial details
    press release detailing SEC filings
.....

Solution descriptions that include part-whole relations and constraints covering software, hardware, and communication compatibility.

OWL requirements

Defaults and constraints.
Part-whole relations.
Language neutral representation.
Instances distinct from classes.
We need a clean interface between Web Ontologies and more mainstream business and manufacturing XML standards.

Use Case 3: "Aerospace Engineering Data Modelling"

Contributor: Stephen Buswell

Context

Support for corporate communication and corporate memory in aerospace engineering.

Task

Organizing a large body of technical documentation into cross-linked hierarchical categories

Example domain

Aircraft design documentation; manufacturing process documentation; testing process documentation; maintenance documentation; illustrations

Typical users

Maintenance engineer looking for all information relating to a particular part (e.g. 'wing-spar').
Design engineer looking at constraints on re-use of a particular sub-assembly.

Ontology samples

Document type hierarchy:

Document
  Design Document
    Sub-assembly design doc 
....

Component type hierarchy:
```
ManufacturingComponent
  wings-spar
```
Part-whole relations:
```
[wing-spar ispartof wing-assembly]
```
Inter-part constraints:
```
[wing-spar.length < wing.length]
```

General relations:

[this.document.this-picture illustrates wing-spar]

Instances:
```
[A380 isinstanceof Aircraft]
```

OWL requirements

Class hierarchy
Defaults
Inter-class constraints.
Part-whole relations
General relations
Language neutral representation.
Representation of instances
We need a clean interface between Web Ontologies and more mainstream business and manufacturing XML standards.

Use Case 4: "Art-image collections"

Contributor: Guus Schreiber

Context

We are working on semantic annotations of images of art objects. The purpose is to support both indexing and search through ontologies. There are many knowledge sources for art. We focus here on two of these:

The VRA 3.0 standard for image descriptions, which is basically a refinement of Dublin Core for art-image annotation
The Art and Architecture Thesaurus (AAT) constructed by the Getty Foundation, which provides a highly structured hierarchy of some 120.000 terms to describe art objects (art categories, materials, styles, color, ....).

We want to use the WebOnt language to represent the image description template provided by VRA and to link every data element of VRA to the subtrees of the AAT hierarchy where the "fillers" of the data element can be found. For example, we want to link the VRA data element "style/period" to the AAT subtrees representing styles and periods.

In addition, we want to express in the ontology additional knowledge. For example, if an indexer selects the value "Late Georgian" for the style/period of (say) an antique chest of drawers, we want to be able to infer that the data element "date.created" should have a value between 1760 and 1811 A.D. and that the "culture" is British. Availability of this type of background knowledge increases significantly the support that can be given for indexing as well as for search.

Task

Indexing and searching a digital image collection

Example domain

Museum collection of images of antique furniture

Typical users

Museum personnel involved in indexing images.
Lay person with some basic knowledge of the domain, looking for some piece of antique

Ontology samples

Representing the AAT color hierarchy

In our ontology we want to express what AAT terms can act as values for the data element "color". AAT has an elaborate hierarchy for colors, which is structured more or less like this:

<color>
  <chromatic color>
    pink
      vivid pink 
      strong pink
      ....
      <intermediate pink>
        purplish pink
	  brilliant purplish pink
	  ....
	yellowish pink
	  ....
	brownish pink
    (etc.)
  <neutral color>
    white
    gray
      light gray
      ....
    black

The terms of type "<label>" are what AAT calls 'guide terms'. Their purpose is to provide structure to the hierarchy. When we specify a value restriction for the slot "color" of an image description template we ideally just want to say that any subclass of the <color> hierarchy can be used as slot filler, but we probably want to exclude the guide terms from the value set. The difference between the guide terms and the actual color values is close to what is being called abstract vs. concrete classes in UML (abstract classes cannot be instantiated, concrete classes can). Such a notion is however absent in RDFS and in DAML+OIL.

BTW: An assumption of our work is that in order to be successful we need to build on the (semi-)ontologies already available (like AAT), and that we will have to map these onto a representation in a WebOnt language. It is unrealistic to assume we can redo large-scale efforts like AAT.

Representing an aggregate structure

When we want to index an object such as an antique chest of drawers, there is almost always a need to represent the part-of structure of the object. For example, the we want to assign a style value to the feet of a chest, e.g. "bun feet". In our view the WebOnt committee should seriously consider introducing some (limited form) of aggregation into the WebOnt language. If you just represent this as another slot/relation, you lose much of the semantics. This is likely to be a requirement from UML people as well (aggregation has a prominent place in UML class models).

Definitional knowledge

Let's for the moment assume we can represent AAT and VRA in WebOnt. For effective search support we need to add domain knowledge to this ontology. This knowledge typically takes the form of inter-slot constraints within the image description template. One example: style/period = "Late Georgian" => culture = "British" AND date.created = between 1760 and 1811 [Style/period, culture and date.created are all VRA data elements defined as slots for our art-object description template.]

We could not define this constraint in RDFS. Sean Bechofer (Univ. of Manchester) provided a DAML+OIL solution (details of data-type representation and URIs left out):

<daml:Restriction>
 <daml:onProperty rdf:resource="some-URL#style"/>
  <daml:hasClass>
   <daml:Class rdf:about="some-URL#Late Georgian"/>
  </daml:hasClass>
 <rdfs:subClassOf>
  <daml:Class>
   <daml:intersectionOf rdf:parseType="daml:collection"/>
    <daml:Restriction>
     <daml:onProperty rdf:resource="some-URL#date"/>
     <daml:hasClass>
      <daml:Class rdf:about="some-URL#1760-1811"/>
      </daml:hasClass>
     </daml:Restriction>
     <daml:Restriction>
      <daml:onProperty rdf:resource="some-URL#culture"/>
      <daml:hasClass>
       <daml:Class rdf:about="some-URL#British"/>
      </daml:hasClass>
     </daml:Restriction>
    </daml:intersectionOf>
  </daml:Class>
 </rdf:subClassOf>
</daml:Restriction>

So, all Late-Georgian things are subclasses of British things and all things created between 1760-1811. This is similar to what is called multiple specialization in data modelling.

Two issues arise here:

The syntax is really awful.
For most users this will not be an intuitive way of defining definitional knowledge. Can OWL provide a more natural way?

[Special thanks to Sean Bechofer and Frank van Harmelen for their input.]

Default knowledge

This is in fact the most common form of domain knowledge in the example domain. One sample of default knowledge:

IF type  "chest of drawers" AND
     style/period = Late-Georgian
THEN (this typically suggests)
     material.main = mahogany

The structure of the knowledge is similar to definitional knowledge, but a solution is probably more difficult. One could call this "inter-slot preferences".

OWL requirements

distinction between abstract and concrete classes
part-whole relations
(easy way to express) definitional knowledge
default knowledge

Use Case 5: "Conceptual Open Hypermedia"

Contributor: Nick Gibbins

Context

Improving navigation while browsing through a corpus such as a large website. .

Task

Creating an overlay of hypertext links onto a corpus (a linkbase)

Example domain

Organizational and research documents generated by an academic institution.

Typical users

novice user who needs further explanation of terms in documents (e.g. information on people mentioned in documents)
experienced user who knows rough location of desired information and is prepared to browse to find it
experienced user annotating documents (associating terms in documents with ontology entities), so allowing new links to be created

Ontology samples

The ontology is based in part on Dublin Core (describing bibliographic metadata), but also requires some representation of the content of the documents (departmental board minutes, grant applications, etc) in order to describe their contents (or rather, those entities which are referred to in their contents).

OWL requirements

referring to instances (e.g. people) by means of their properties
composition of relations: required to specify certain types of links (e.g. a link to the home page of the author of a document)
ability to define lexical terms which commonly denote entities: For example, the lexical term "Nick Gibbins" is commonly used to refer to the person with email address nmg@ecs.soton.ac.uk Denotation of these terms is not necessarily static. for example, the lexical term "head of department" refers to different individuals based on the context in which it is used (publication date of the document in which the term appears)
provenance: No explicit author of links, but provenance of links is that of the facts from which they are constructed

References

The Art and Architecture Thesaurus http://shiva.pub.getty.edu.
Visual Resources Association~Standards Committee. VRA core categories, version 3.0. Technical report, Visual Resources Association, July 2000. http://www.gsd.harvard.edu/~staffaw3/vra/vracore3.htm.