OWL Use Cases: Collection Management

Status: document for face-to-face meeting on Jan 14-15

Version: January 7, 2002

Editor

Guus Schreiber, IBROW / University of Amsterdam, schreiber@swi.psy.uva.nl

Members

Document Purpose and Overview

This document characterizes the collection-management area and describes five possible use cases for the area of "collection management". The use cases are described in a standard format. Emphasis is placed on concrete examples of knowledge and information that would need to be represented in OWL (the Ontology Web Language).

This document should serve as input for the face-to-face meeting of the W3C Web Ontology Working Group on 14-15 January 2002 at Bell Labs in Murray Hill, NJ

For readability purposes the document starts with a summary of the requirements arising from the use cases.


Summary of Resulting OWL Requirements

The numbers refer to the use cases below.

Expressivity requirements

Other requirements


Use-Case Area Scope and Definition

This use-case area typically has the following characteristics: Collection-management typically has the following subtasks:

Links to Other Areas/Issues/Tasks

Virtual catalogs

There is a clear link here to the "interoperability" area. Virtual catalogs typically requires ontology-mapping stuff. Also, it makes the collection management task different as less assumptions can be made about the collection (e.g., its size).

Service catalogs

These are mentioned in a number of use cases. With respect to the declarative aspects of service description and search, there is a clear link between "web services" and this area.

Presentation generation

Semantically annotated catalogs are an ideal substrate for (context-specific) generation of presentations c.q. web pages. Example: dynamic configuration of a web page for browsers of an art catalog, showing related texts and images.

Conceptual search

In conceptual search we would like to view the whole web as one indexed catalog. This seems to be a bridge too far at the moment, given the problems we still have a domain-specific catalogs. A realistic scenario for the short-term conceptual search is a two-step process:
  1. Use an Open-Directory like mechanism to constrain your search to an area which hopefully provides some archives/catalogs.
  2. Use the semantic search engines of the catalogs to find an answer to your query.

Content standards

Due to the domain specificity of catalogs, many of them require a clear link with domain standards/vocabularies (existing or under development). These domain standards were typically developed to support manual indexing. Also, more general resources such as WordNet are being used.

Use Case 1: "Arkive: catalog of endangered species descriptions"

Contributor: Jeremy Carrol

Context

The Arkive project is creating a multimedia database consisting of a record for each endangered species. The database aims at completeness, with enough appropriate information for each species. The database is accessed through a web site and targeted at users at all levels of expertise: ranging from school children through to domain expert. The key functions of ontological knowledge are: Other functions where ontological knowledge maybe useful include organizing annotations and provenance of knowledge. We note that:

Task

Organizing, commissioning and querying a database of multimedia records.

Example domain

Multimedia records of endangered species.

Typical users

Ontology samples

Currently they use about ten master record-templates for the different top-level categories. For example, there is typically no "locomotion" field for plants, but it is of interest for animals.

These top-level categories are necessarily insufficient in that they cover (only) the general types of behavior. Any unique or rare behavior of a species is:

Also, such behaviors are subject to scientific debate. A concrete example was to do with birds that pick up poisonous insects in their beaks and rub them against their feathers. It is contentious whether they do this: The name you use for the behavior depends on your judgment on its motivation; which may well depend on your political persuasion.

There are also some behaviors which have multiple different names that are synonymous. Default inheritance is important. The well known penguins issue:

   living things don't fly
   birds         do    fly
   penguins      don't fly
This can be addressed when first creating a record, when default values can be filled in, to be changed if necessary, or more dynamically.

It is important to relate the category information back to multiple (partially inconsistent) taxonomies in the field.

OWL requirements

Hard to say - there are a range of knowledge base requirements, which ones actually belong to the ontological subsystem is problematic.

Use Case 2: "EDS web page landfill"

Contributor: Mike Smith

Context

Support for corporate communication and corporate memory.

Task

Organizing a massive web page land-fill into hierarchical categories

Example domain

External press releases, product offerings and case studies, corporate procedures, internal product briefings and comparisons, white papers, and offering process descriptions.

Typical users

Ontology samples

Document type hierarchy:
Press release
  press release covering financial details
    press release detailing SEC filings
.....
Solution descriptions that include part-whole relations and constraints covering software, hardware, and communication compatibility.

OWL requirements

Use Case 3: "Aerospace Engineering Data Modelling"

Contributor: Stephen Buswell

Context

Support for corporate communication and corporate memory in aerospace engineering.

Task

Organizing a large body of technical documentation into cross-linked hierarchical categories

Example domain

Aircraft design documentation; manufacturing process documentation; testing process documentation; maintenance documentation; illustrations

Typical users

Ontology samples

OWL requirements

Use Case 4: "Art-image collections"

Contributor: Guus Schreiber

Context

We are working on semantic annotations of images of art objects. The purpose is to support both indexing and search through ontologies. There are many knowledge sources for art. We focus here on two of these:
  1. The VRA 3.0 standard for image descriptions, which is basically a refinement of Dublin Core for art-image annotation
  2. The Art and Architecture Thesaurus (AAT) constructed by the Getty Foundation, which provides a highly structured hierarchy of some 120.000 terms to describe art objects (art categories, materials, styles, color, ....).
We want to use the WebOnt language to represent the image description template provided by VRA and to link every data element of VRA to the subtrees of the AAT hierarchy where the "fillers" of the data element can be found. For example, we want to link the VRA data element "style/period" to the AAT subtrees representing styles and periods.

In addition, we want to express in the ontology additional knowledge. For example, if an indexer selects the value "Late Georgian" for the style/period of (say) an antique chest of drawers, we want to be able to infer that the data element "date.created" should have a value between 1760 and 1811 A.D. and that the "culture" is British. Availability of this type of background knowledge increases significantly the support that can be given for indexing as well as for search.

Task

Indexing and searching a digital image collection

Example domain

Museum collection of images of antique furniture

Typical users

Ontology samples

Representing the AAT color hierarchy

In our ontology we want to express what AAT terms can act as values for the data element "color". AAT has an elaborate hierarchy for colors, which is structured more or less like this:
<color>
  <chromatic color>
    pink
      vivid pink 
      strong pink
      ....
      <intermediate pink>
        purplish pink
	  brilliant purplish pink
	  ....
	yellowish pink
	  ....
	brownish pink
    (etc.)
  <neutral color>
    white
    gray
      light gray
      ....
    black
The terms of type "<label>" are what AAT calls 'guide terms'. Their purpose is to provide structure to the hierarchy. When we specify a value restriction for the slot "color" of an image description template we ideally just want to say that any subclass of the <color> hierarchy can be used as slot filler, but we probably want to exclude the guide terms from the value set. The difference between the guide terms and the actual color values is close to what is being called abstract vs. concrete classes in UML (abstract classes cannot be instantiated, concrete classes can). Such a notion is however absent in RDFS and in DAML+OIL.

BTW: An assumption of our work is that in order to be successful we need to build on the (semi-)ontologies already available (like AAT), and that we will have to map these onto a representation in a WebOnt language. It is unrealistic to assume we can redo large-scale efforts like AAT.

Representing an aggregate structure

When we want to index an object such as an antique chest of drawers, there is almost always a need to represent the part-of structure of the object. For example, the we want to assign a style value to the feet of a chest, e.g. "bun feet". In our view the WebOnt committee should seriously consider introducing some (limited form) of aggregation into the WebOnt language. If you just represent this as another slot/relation, you lose much of the semantics. This is likely to be a requirement from UML people as well (aggregation has a prominent place in UML class models).

Definitional knowledge

Let's for the moment assume we can represent AAT and VRA in WebOnt. For effective search support we need to add domain knowledge to this ontology. This knowledge typically takes the form of inter-slot constraints within the image description template. One example: style/period = "Late Georgian" => culture = "British" AND date.created = between 1760 and 1811 [Style/period, culture and date.created are all VRA data elements defined as slots for our art-object description template.]

We could not define this constraint in RDFS. Sean Bechofer (Univ. of Manchester) provided a DAML+OIL solution (details of data-type representation and URIs left out):

<daml:Restriction>
 <daml:onProperty rdf:resource="some-URL#style"/>
  <daml:hasClass>
   <daml:Class rdf:about="some-URL#Late Georgian"/>
  </daml:hasClass>
 <rdfs:subClassOf>
  <daml:Class>
   <daml:intersectionOf rdf:parseType="daml:collection"/>
    <daml:Restriction>
     <daml:onProperty rdf:resource="some-URL#date"/>
     <daml:hasClass>
      <daml:Class rdf:about="some-URL#1760-1811"/>
      </daml:hasClass>
     </daml:Restriction>
     <daml:Restriction>
      <daml:onProperty rdf:resource="some-URL#culture"/>
      <daml:hasClass>
       <daml:Class rdf:about="some-URL#British"/>
      </daml:hasClass>
     </daml:Restriction>
    </daml:intersectionOf>
  </daml:Class>
 </rdf:subClassOf>
</daml:Restriction>
So, all Late-Georgian things are subclasses of British things and all things created between 1760-1811. This is similar to what is called multiple specialization in data modelling.

Two issues arise here:

[Special thanks to Sean Bechofer and Frank van Harmelen for their input.]

Default knowledge

This is in fact the most common form of domain knowledge in the example domain. One sample of default knowledge:
IF type  "chest of drawers" AND
     style/period = Late-Georgian
THEN (this typically suggests)
     material.main = mahogany
The structure of the knowledge is similar to definitional knowledge, but a solution is probably more difficult. One could call this "inter-slot preferences".

OWL requirements

Use Case 5: "Conceptual Open Hypermedia"

Contributor: Nick Gibbins

Context

Improving navigation while browsing through a corpus such as a large website. .

Task

Creating an overlay of hypertext links onto a corpus (a linkbase)

Example domain

Organizational and research documents generated by an academic institution.

Typical users

  1. novice user who needs further explanation of terms in documents (e.g. information on people mentioned in documents)
  2. experienced user who knows rough location of desired information and is prepared to browse to find it
  3. experienced user annotating documents (associating terms in documents with ontology entities), so allowing new links to be created

Ontology samples

The ontology is based in part on Dublin Core (describing bibliographic metadata), but also requires some representation of the content of the documents (departmental board minutes, grant applications, etc) in order to describe their contents (or rather, those entities which are referred to in their contents).

OWL requirements

referring to instances (e.g. people) by means of their properties
composition of relations
required to specify certain types of links (e.g. a link to the home page of the author of a document)
ability to define lexical terms which commonly denote entities
For example, the lexical term "Nick Gibbins" is commonly used to refer to the person with email address nmg@ecs.soton.ac.uk Denotation of these terms is not necessarily static. for example, the lexical term "head of department" refers to different individuals based on the context in which it is used (publication date of the document in which the term appears)
provenance
No explicit author of links, but provenance of links is that of the facts from which they are constructed

References

  1. The Art and Architecture Thesaurus http://shiva.pub.getty.edu.
  2. Visual Resources Association~Standards Committee. VRA core categories, version 3.0. Technical report, Visual Resources Association, July 2000. http://www.gsd.harvard.edu/~staffaw3/vra/vracore3.htm.