OWL Use Cases: Collection Management
Status: document for face-to-face meeting on Jan 14-15
Version: January 7, 2002
Editor
Guus Schreiber, IBROW / University of Amsterdam, schreiber@swi.psy.uva.nl
Members
- Stephen Buswell, Stilo Technology, StephenB@stilo.com
- Nicholas Gibbins, University of Southampton, nmg@ecs.soton.ac.uk
- Guus Schreiber, IBROW / University of Amsterdam,
schreiber@swi.psy.uva.nl
- Michael K. Smith, Electronic Data System (EDS), michael.smith@eds.com
Document Purpose and Overview
This document characterizes the collection-management area and
describes five possible use cases for the area of
"collection management". The use cases are described in a standard
format. Emphasis is placed on concrete examples of knowledge and
information that would need to be represented in OWL (the Ontology Web
Language).
This document should serve as input for the face-to-face meeting of
the W3C Web Ontology Working Group on 14-15 January 2002 at Bell Labs in
Murray Hill, NJ
For readability purposes the document starts with a summary of the
requirements arising from the use cases.
Summary of Resulting OWL Requirements
The numbers refer to the use cases below.
Expressivity requirements
- Class hierarchy with properties/attributes and relations
(all)
Explicit distinction between attributes (pointing to datatype values)
and general relations,
- Default knowledge (1, 2, 4)
-
living things don't fly
-
Late Georgian chests of drawers are typically made of
mahogany wood
- Part-whole relation (2, 3, 4)
-
a wing spar is part of a wing assembly
-
chests of drawers have feet with their own style
- Constraints (2, 3, 4)
-
wing-spar.length < wing.length
-
furniture.style = Late-Georgian <=> furniture.culture =
British AND furniture.date-created 1760-1811
- Note also the comments about the DAML+OIL solution to the latter
example in use case 4.
- Instance specification / instances as classes (1, 3, 5)
-
Mammal
is an instance of Species
but also
a class in its own right.
-
A380
is an instance of Aircraft
, but
also denotes a set of A380 instances.
- Cf. Protege-2000's metclass notion. Note: this is a feature
of RDF.
- Anonymous instance specification through property values.
- Mechanism for relation typing (5)
Also an RDF feature (you can define an rdf:type
for a
property.
- Abstract classes (4)
Classes without instances: see the color
hierarchy example
in use case 4.
- Synonyms / lexical term -> concept (1, 5)
Need to distinguish lexical term from concept it denotes.
[This may not be an OWL issue.]
Other requirements
- Provenance (1, 5)
Typical examples: distinguishing annotations by experts and
non-experts, authorship of hyperlinks.
- Version management (1)
Ability to extend/revise ontology.
- Query support (1)
Ability to reason with falsehood, e.g. "whales as fish".
- Support for content standards (1, 3)
General thesauri (WordNet, TGN) as well as
domain-specific thesauri (AAT,
ICONCLASS) are often used to standardize annotations.
Use-Case Area Scope and Definition
This use-case area typically has the following characteristics:
- Large data/text/image/multimedia/website sets with a common
theme/context/focus
- Relatively fixed set of items in archive/collection
- The collection can be very large set, so scalability issues
typically come into play.
- Collection management is typically domain specific, therefore
linked to (traditional) work on domain standards.
- The focus is on metadata, therefore a natural link traditional
metadata efforts.
Collection-management typically has the following subtasks:
- item indexing/annotation/classification
- collection updates
- collection search
- often involves default reasoning
Links to Other Areas/Issues/Tasks
Virtual catalogs
Examples:
- virtual museum (several projects)
- product search/comparison sites (e.g., Lynn Stein's book
identification, Mike Dean's hotels)
There is a clear link here to the "interoperability" area. Virtual
catalogs typically requires ontology-mapping stuff. Also, it makes the
collection management task different as less assumptions can be made
about the collection (e.g., its size).
Service catalogs
These are mentioned in a number of use cases. With respect to the
declarative aspects of service description and search, there is a
clear link between "web services" and this area.
Presentation generation
Semantically annotated catalogs are an ideal substrate for
(context-specific) generation of presentations c.q. web pages. Example: dynamic
configuration of a web page for browsers of an art catalog, showing
related texts and images.
Conceptual search
In conceptual search we would like to view the whole web as one indexed
catalog. This seems to be a bridge too far at the moment, given the
problems we still have a domain-specific catalogs. A realistic
scenario for the short-term conceptual search is a two-step process:
-
Use an Open-Directory like mechanism to constrain your search to an
area which hopefully provides some archives/catalogs.
-
Use the semantic search engines of the catalogs to find an answer to
your query.
Content standards
Due to the domain specificity of catalogs, many of them require a
clear link with domain standards/vocabularies (existing or under
development). These domain standards were typically developed to
support manual indexing.
Also, more general resources such as WordNet are being used.
Use Case 1: "Arkive: catalog of endangered species
descriptions"
Contributor: Jeremy Carrol
Context
The Arkive project is creating a multimedia database
consisting of a record for each endangered species. The database
aims at completeness, with enough appropriate information for each
species. The database is accessed through a web site and targeted at
users at all levels of expertise: ranging from school children through
to domain expert.
The key functions of ontological knowledge are:
- to allow consistent organization of each species record
- to provide a means for ensuring that each species record is
sufficiently detailed, and includes examples of each important
behavior.
- to help with query across the database
Other functions where ontological knowledge maybe useful include
organizing annotations and provenance of knowledge.
We note that:
-
despite the relevant science having had about two centuries of
debate there is no universal agreement about appropriate
ontologies for full and adequate species descriptions.
-
the number of species suggests that globally a federated solution is
needed. The British participants have funding to make records of all
British species, and the top N globally endangered species. The
long-term plan would be to have people world-wide contributing records
for their local species. This is likely to exacerbate the lack of
agreement about the underlying ontologies.
Task
Organizing, commissioning and querying a database of multimedia records.
Example domain
Multimedia records of endangered species.
Typical users
- scientist making a specific record.
- manager commissioning new records.
- scientist querying DB through web-site
- school child querying DB through web-site
Ontology samples
Currently they use about ten master record-templates for the
different top-level categories.
For example, there is typically no "locomotion" field for
plants, but it is of interest for animals.
These top-level categories are necessarily insufficient in
that they cover (only) the general types of behavior.
Any unique or rare behavior of a species is:
- important to include in the record
- not in the top-level category
Also, such behaviors are subject to scientific debate.
A concrete example was to do with birds that pick up
poisonous insects in their beaks and rub them against their
feathers. It is contentious whether they do this:
- to get high, or
- to kill off parasites in their feathers
The name you use for the behavior depends on your judgment
on its motivation; which may well depend on your political persuasion.
There are also some behaviors which have multiple different
names that are synonymous.
Default inheritance is important. The well known penguins issue:
living things don't fly
birds do fly
penguins don't fly
This can be addressed when first creating a record, when default
values can be filled in, to be changed if necessary, or more
dynamically.
It is important to relate the category information back to
multiple (partially inconsistent) taxonomies in the field.
OWL requirements
Hard to say - there are a range of knowledge base requirements,
which ones actually belong to the ontological subsystem is
problematic.
-
Hierarchical classes with inheritance of properties,
default values, etc. Probably single inheritance would
suffice.
-
Provedance: to distinguish facts that are in the
specific record, from later annotations by experts or
non-experts, from inherited facts etc.
-
Query support. Query may be guided by category information,
and possibly by falsehoods (e.g. "whales are fish" may be
useful to help small children search, who might otherwise
conclude there are no whales in the DB).
Mixed mode query - both free text and category information.
-
Multiple synonymous labels for properties and values.
-
Thesaural support.
-
Ability to extend ontology on the fly, in a distributed
fashion. (Experts adding framework to describe the special
behavior of their species).
Use Case 2: "EDS web page landfill"
Contributor: Mike Smith
Context
Support for corporate communication and corporate memory.
Task
Organizing a massive web page land-fill into hierarchical categories
Example domain
External press releases, product offerings and case studies,
corporate procedures, internal product briefings and comparisons, white
papers, and offering process descriptions.
Typical users
-
Salesperson looking for sales collateral relevant to a
client's expressed interest.
-
Technical person looking for pockets of
specific technical expertise and detailed past experience.
Ontology samples
Document type hierarchy:
Press release
press release covering financial details
press release detailing SEC filings
.....
Solution descriptions that include part-whole relations and constraints
covering software, hardware, and communication compatibility.
OWL requirements
- Defaults and constraints.
- Part-whole relations.
- Language neutral representation.
- Instances distinct from classes.
- We need a clean interface between Web Ontologies and more mainstream
business and manufacturing XML standards.
Use Case 3: "Aerospace Engineering Data Modelling"
Contributor: Stephen Buswell
Context
Support for corporate communication and corporate memory in aerospace
engineering.
Task
Organizing a large body of technical documentation
into cross-linked hierarchical categories
Example domain
Aircraft design documentation; manufacturing process
documentation; testing process documentation; maintenance documentation;
illustrations
Typical users
-
Maintenance engineer looking for all information relating
to a particular part (e.g. 'wing-spar').
-
Design engineer looking at
constraints on re-use of a particular sub-assembly.
Ontology samples
-
Document type hierarchy:
Document
Design Document
Sub-assembly design doc
....
-
Component type hierarchy:
ManufacturingComponent
wings-spar
-
Part-whole relations:
[wing-spar ispartof wing-assembly]
-
Inter-part constraints:
[wing-spar.length < wing.length]
-
General relations:
[this.document.this-picture illustrates wing-spar]
-
Instances:
[A380 isinstanceof Aircraft]
OWL requirements
- Class hierarchy
- Defaults
- Inter-class constraints.
- Part-whole relations
- General relations
- Language neutral representation.
- Representation of instances
- We need a clean interface between Web Ontologies and more mainstream
business and manufacturing XML standards.
Use Case 4: "Art-image collections"
Contributor: Guus Schreiber
Context
We are working on semantic annotations of images of art objects. The
purpose is to support both indexing and search through ontologies.
There are many knowledge sources for art. We focus here on two of these:
-
The VRA 3.0 standard for image descriptions,
which is basically a refinement of Dublin Core for art-image annotation
-
The Art and Architecture Thesaurus (AAT)
constructed by the Getty Foundation, which provides a highly
structured hierarchy of some 120.000 terms to describe art objects
(art categories, materials, styles, color, ....).
We want to use the WebOnt language to represent the image description
template provided by VRA and to link every data element of VRA to the
subtrees of the AAT hierarchy where the "fillers" of the data element
can be found. For example, we want to link the VRA data element
"style/period" to the AAT subtrees representing styles and periods.
In addition, we want to express in the ontology additional
knowledge. For example, if an indexer selects the value "Late
Georgian" for the style/period of (say) an antique chest of drawers,
we want to be able to infer that the data element "date.created"
should have a value between 1760 and 1811 A.D. and that the "culture"
is British. Availability of this type of background knowledge
increases significantly the support that can be given for indexing as
well as for search.
Task
Indexing and searching a digital image collection
Example domain
Museum collection of images of antique furniture
Typical users
-
Museum personnel involved in indexing images.
-
Lay person with some basic knowledge of the domain,
looking for some piece of antique
Ontology samples
Representing the AAT color hierarchy
In our ontology we want to express what AAT terms can act as values
for the data element "color". AAT has an elaborate hierarchy for
colors, which is structured more or less like this:
<color>
<chromatic color>
pink
vivid pink
strong pink
....
<intermediate pink>
purplish pink
brilliant purplish pink
....
yellowish pink
....
brownish pink
(etc.)
<neutral color>
white
gray
light gray
....
black
The terms of type "<label>" are what AAT calls 'guide terms'. Their
purpose is to provide structure to the hierarchy. When we specify a
value restriction for the slot "color" of an image description
template we ideally just want to say
that any subclass of the <color> hierarchy can be used as slot
filler, but we probably want to exclude the guide terms from the value
set. The difference between the guide terms and the actual color
values is close to what is being called abstract vs. concrete classes
in UML (abstract classes cannot be instantiated, concrete classes
can). Such a notion is however absent in RDFS and in DAML+OIL.
BTW: An assumption of our work is that in order to be successful we need to
build on the (semi-)ontologies already available (like AAT), and that we
will have to map these onto a representation in a WebOnt language. It
is unrealistic to assume we can redo large-scale efforts like AAT.
Representing an aggregate structure
When we want to index an object such as an antique chest of drawers,
there is almost always a need to represent the part-of structure of
the object. For example, the we want to assign a style value to the
feet of a chest, e.g. "bun feet". In our view the WebOnt committee
should seriously consider introducing some (limited form) of
aggregation into the WebOnt language. If you just represent this as
another slot/relation, you lose much of the semantics. This is likely
to be a requirement from UML people as well (aggregation has a
prominent place in UML class models).
Definitional knowledge
Let's for the moment assume we can represent AAT and VRA in
WebOnt. For effective search support we need to add domain knowledge
to this ontology. This knowledge typically takes the form of
inter-slot constraints within the image description template. One
example:
style/period = "Late Georgian"
=>
culture = "British" AND
date.created = between 1760 and 1811
[Style/period, culture and date.created are all VRA data elements
defined as slots for our art-object description template.]
We could not define this constraint in RDFS.
Sean Bechofer (Univ. of Manchester) provided a DAML+OIL solution
(details of data-type representation and URIs left out):
<daml:Restriction>
<daml:onProperty rdf:resource="some-URL#style"/>
<daml:hasClass>
<daml:Class rdf:about="some-URL#Late Georgian"/>
</daml:hasClass>
<rdfs:subClassOf>
<daml:Class>
<daml:intersectionOf rdf:parseType="daml:collection"/>
<daml:Restriction>
<daml:onProperty rdf:resource="some-URL#date"/>
<daml:hasClass>
<daml:Class rdf:about="some-URL#1760-1811"/>
</daml:hasClass>
</daml:Restriction>
<daml:Restriction>
<daml:onProperty rdf:resource="some-URL#culture"/>
<daml:hasClass>
<daml:Class rdf:about="some-URL#British"/>
</daml:hasClass>
</daml:Restriction>
</daml:intersectionOf>
</daml:Class>
</rdf:subClassOf>
</daml:Restriction>
So, all Late-Georgian things
are subclasses of British things and all things created between
1760-1811. This is similar to what is called multiple specialization in
data modelling.
Two issues arise here:
- The syntax is really awful.
- For most users this will not be an intuitive way of defining
definitional knowledge. Can OWL provide a more natural way?
[Special thanks to Sean Bechofer and Frank van Harmelen for their input.]
Default knowledge
This is in fact the most common form of domain knowledge in the
example domain. One sample of default knowledge:
IF type "chest of drawers" AND
style/period = Late-Georgian
THEN (this typically suggests)
material.main = mahogany
The structure of the knowledge is similar to definitional knowledge,
but a solution is probably more difficult. One could call this
"inter-slot preferences".
OWL requirements
- distinction between abstract and concrete classes
- part-whole relations
- (easy way to express) definitional knowledge
- default knowledge
Use Case 5: "Conceptual Open Hypermedia"
Contributor: Nick Gibbins
Context
Improving navigation while browsing through a corpus such as a large
website. .
Task
Creating an overlay of hypertext links onto a corpus (a linkbase)
Example domain
Organizational and research documents generated by an academic
institution.
Typical users
-
novice user who needs further explanation of terms in documents
(e.g. information on people mentioned in documents)
-
experienced user who knows rough location of desired
information and is prepared to browse to find it
-
experienced user annotating documents (associating terms in
documents with ontology entities), so allowing new links to be
created
Ontology samples
The ontology is based in part on Dublin Core (describing bibliographic
metadata), but also requires some representation of the content of the
documents (departmental board minutes, grant applications, etc) in
order to describe their contents (or rather, those entities which are
referred to in their contents).
OWL requirements
- referring to instances (e.g. people) by means of their properties
-
- composition of relations
-
required to specify certain types of links (e.g. a link to the
home page of the author of a document)
- ability to define lexical terms which commonly denote entities
- For example, the lexical term "Nick Gibbins" is commonly used
to refer to the person with email address nmg@ecs.soton.ac.uk
Denotation of these terms is not necessarily static.
for example, the lexical term "head of department" refers to
different individuals based on the context in which it is used
(publication date of the document in which the term appears)
- provenance
- No explicit author of links, but provenance of links is that of
the facts from which they are constructed
References
-
The Art and Architecture Thesaurus
http://shiva.pub.getty.edu.
-
Visual Resources Association~Standards Committee.
VRA core categories, version 3.0.
Technical report, Visual Resources Association, July 2000.
http://www.gsd.harvard.edu/~staffaw3/vra/vracore3.htm.