COLLECT: updated summary Dec 14

Colleagues,

I have updated the collection-management area summary with the use-case
descriptions I got from Stephen and Nick (see end of message). Please
send me your feedback on this first cut. 

I think we already have quite a diverse and representative collection of
use cases. I have tried to generalize somewhat from each use case:

1. Endangered species:  
Structured scientific descriptions: there are many of these collections
on the web, describing animals, plants, diseases, etc. 

2. EDS website management
Knowledge management and organization for large companies, both for
internal use and for external presentation to customers
 
3. Aerospace data modelling
Engineering catalogs: many other examples, see also Ruediger Klein's
email, IMAT project on indexing technical manuals, etc. Is there a link
with the STEP world (e.g. the "application protocols")? We could learn
from their representation problems in Express.

4. Antique furniture image collection
art-image catalogs: every museum and art collection with some self
respect seems to be working on this. There are also musea cooperating in
"virtual museum" projects (e.g. Rijksmuseum Amsterdam).

5. Open hypermedia
Navigating document collections for (virtual) organizations; this use
case in some sense a similar flavor as (2). 

Did we miss out an important category?
If you can, please also provide as much as possible concrete "challenge
problems" arising from the use cases. 

Greetings, Guus

[I'll be teaching a Ph.D. course next week, so my email processing will
be a bit slow at times]

----------------------------------------------------------------------
USE-CASE AREA: Collection Management

Guus Schreiber (editor)
Version: 14 december, 2001

MEMBERS:
Stephen Buswell <StephenB@stilo.com>
Nicholas Gibbins <nmg@ecs.soton.ac.uk>,
Guus Schreiber <schreiber@swi.psy.uva.nl>, 
Michael K. Smith <michael.smith@eds.com>


General structure of the use-case area description
- definition/scope of the area: tasks, typical domains, users
- links to other areas
- resulting list of language requirements arising from use cases
- 3-5 detailed use-case descriptions

---------------------------
SCOPE AND DEFINITION
---------------------------

Characteristics:
* Large data/text/image/multimedia/website sets with a common
theme/context/focus 
* fixed set of items in archive/collection
* can be very large set => scalability issues
* typically domain specific, therefore linked to (traditional) work on
domain standards 
* focus is on metadata => link to traditional metadata

Collection-management subtasks:
* item indexing/annotation/classification
* collection updates
* collection search
* often involves default reasoning


---------------------------
LINKS TO OTHER AREAS/ISSUES/TASKS
---------------------------

1. Virtual catalogs

Examples: 
- virtual museum (several projects)
- product search/comparison sites (e.g., Lynn Stein's book
identification, Mike Dean's hotels)  
. 
There is a clear link jere to the "interoperability" area. Virtual
catalogs typically requires ontology-mapping stuff. Also, it makes the
collection management task different as less assumptions can be made
about the collection (e.g., its size).

2. Service catalogs

These are mentioned in a number of use cases. With respect to the
declarative aspects of service description and search, there is a
clear link between "web services" and this area. <to be worked out>

3. Presentation generation

Semantically annotated catalogs are an ideal substrate for
(context-specific) generation of presentations c.q. web pages. Example:
dynamic
configuration of a web page for browsers of an art catalog, showing
related texts and images. 

Example: 
- work Lynda Hardman (CWI, Amsterdam)
- open hypermedia?!: generation of links based on ontologies (Nick
Gibbins, Southampton)

4. Conceptual search

In conceptual search we would like to view the whole web as one indexed
catalog. This seems to be a bridge too far at the moment, given the
problems we still have a domain-specific catalogs. A realistic
scenario for the short-term conceptual search is a two-step process:
1. use an Open-Directory like mechanism to constrain your search to an
area which hopefully provides some archives/catalogs
2. use the semantic search engines of the catalogs to find an answer to
your query.

5. Content standards

Due to the domain specificity of catalogs, many of them require a
clear link with domain standards/vocabularies (existing or under
development). These domain standards were typically developed to
support manual indexing.

Also, more general resources such as WordNet are being used.


---------------------------
RESULTING LANGUAGE REQUIREMENTS
---------------------------

Some preliminary examples (numbers refer to use cases below):
- default knowledge / default reasoning (1, 2, 4) 
- constraints (2, 3, 4)
- consistency rules (3)
- some notion of aggregation (3, 4)
- statements at class and at instance level (2, 3, 4)
- (WebOnt representation of thesauri / domain standards: AAT, TGN,
WordNet) (1, 4) 


---------------------------
USE-CASE DESCRIPTIONS
---------------------------

1. Arkive: catalog of endangered species descriptions
Contributor: Jeremy Carrol, HP

<cut&paste of Jeremy's email>

The arkive project is creating a multimedial database consisiting of
a record for each endangered species.

The database aims at completeness, with enough appropriate information
for each species.

The database is accessed through a web site and targetted at users at 
all levels of expertise: ranging from school children through to
domain expert.

The key functions of ontological knowledge are:
+ to allow consistent organization of each species record
+ to provide a means for ensuring that each species record is
  sufficiently detailed, and includes examples of each important
  behaviour.
+ to help with query across the database

Other functions where ontological knowledge maybe useful include
organising annotations and providence of knowledge.

We note that:
- despite the relevant science having had about two centuries of
  debate there is no universal agreement about appropriate 
  ontologies for full and adequate species descriptions.
- the number of species suggests that globally a federated solution
  is needed. The British participants have funding to make records
  of all British species, and the top N globally endangered species.
  The long-term plan would be to have people world-wide contributing
  records for their local species. This is likely to exacerbate the 
  lack of agreement about the underlying ontologies.


  TASK:
Organising, and commisioning multimedia records of 
endangered species.

  EXAMPLE DOMAIN:
multimedia records of endangered species.

  TYPICAL USERS:
1: scientist making a specific record.
2: manager commissioning new records.
3: scientist querying DB through web-site
4: school child querying DB through web-site

  ONTOLOGY SAMPLES:
I will need to get back to my informant for better data.
I rapidly get out of my depth biologically in this point
in the presentations I have seen.

Currently they use about ten master record-templates for the
different top-level categories. 
For example, there is typically no "locomotion" field for
plants, but it is of interest for animals.

These top-level categories are necessarily insufficient in
that they cover (only) the general types of behaviour.
Any unique or rare behaviour of a species is:
 + important to include in the record
 + not in the top-level category
also such behaviours are subject to scientific debate.
(A concrete example was to do with birds that pick up
poisionous insects in their beaks and rub them against their
feathers. It is contentious whether they do this:
+ to get high
+ to kill off parasites in their feathers
The name you use for the behaviour depends on your judgement
on its motivation; which may well depend on your political persuasion.)

There are also some behaviours whcih have multiple different
names that are synonymous.

Default inheritance is important. The well known penguins issue:
   living things don't fly
   birds         do    fly
   penguins      don't fly

This can be addressed when first creating a record, when default 
values can be filled in, to be changed if necessary, or more 
dynamically.

It is important to relate the category information back to
multiple (partially inconsistent) taxonomies in the field.


  WEBONT REQUIREMENTS

Hard to say - there are a range of knowledge base requirements,
which ones actually belong to the ontological subsystem is
problematic.

- Hierarchical classes with inheritance of properties, 
  default values, etc. Probably single inheritance would 
  suffice.
- Providence: to distinguish facts that are in the
  specific record, from later annotations by experts or
  non-experts, from inherited facts etc.
- Query support. Query may be guided by category information,
  and possibly by falsehoods (e.g. "whales are fish" may be 
  useful to help small children search, who might otherwise
  conclude there are no whales in the DB)
  Mixed mode query - both free text and category information.
- Multiple synonymous labels for properties and values.
  Theasural support.
- Ability to extend ontology on the fly, in a distributed 
  fashion. (Experts adding framework to describe the special 
  behaviour of their species).


2. EDS web page landfill 
Contribitor: Miker Smith, EDS

TASK: Organizing a massive web page land-fill into hierarchical
categories
in support of corporate communication and corporate memory.  

EXAMPLE DOMAIN: External press releases, product offerings and case
studies,
corporate procedures, internal product briefings and comparisons, white
papers, and offering process descriptions.

TYPICAL USER: Salesperson looking for sales collateral relevant to a
client's expressed interest.  Technical person looking for pockets of
specific technical  expertise and detailed past experience.

ONTOLOGY SAMPLES: Document type hierarchy: Press release <- press
release
covering financial details <- press release detailing SEC filings ....
Solution descriptions that include part-whole relations and constraints
covering software, hardware, and communication compatibility.  

WEBONT REQUIREMENTS:  Defaults and constraints.  Language neutral
representation.  Instances distinct from classes.  

We need a clean interface between Web Ontologies and more mainstream
business and manufacturing XML standards.  


3. Aerospace Engineering Data Modelling
Contributor: Stephen Buswell, Stilo

TASK: Organizing a large body of aerospace engineering documentation
 into cross-linked hierarchical categories
in support of corporate communication and corporate memory.  

EXAMPLE DOMAIN: Aircraft design documentation; manufacturing process
documentation; testing process documentation; maintenance documentation;
illustrations

TYPICAL USER: Maintenance engineer looking for all information relating
to a particular part (eg. 'wing-spar').  Design engineer looking at
constraints on re-use of a particular sub-assembly.

ONTOLOGY SAMPLES: Document type hierarchy: 
Document <- Design Document <- Sub-assembly design doc ....
Component type heirarchy:
ManufacturingComponent <- wings-spar

Solution descriptions that include part-whole relations
[wing-spar ispartof wing-assembly]
 and constraints
[wing-spar.length < wing.length]
 and relations
[this.document.this-picture illustrates wing-spar]
 and instances
[A380 isinstance of Aircraft]

WEBONT REQUIREMENTS:  Defaults and constraints.  Language neutral
representation.  Instances distinct from classes.  

We need a clean interface between Web Ontologies and more mainstream
business and manufacturing XML standards.  


4. Art-image collections
Contributor: Guus Schreiber, Ibrow / University of Amsterdam

TASK: searching a digital image collection

EXAMPLE DOMAIN: museum collection of images of antique furniture 

TYPICAL USER: lay person with some basic knowledge of the domain,
looking for some piece of antique

ONTOLOGY SAMPLES:

The basis of our ontology is formed by the Art and Architecture
Thesaurus (AAT) [1] constructed by the Getty Foundation, which
provides a highly structured hierarchy of some 120.000 terms to
describe art objects (art categories, materials, styles, color,
....). We also have a description template for antique furniture based
on the VRA 3.0 standard [2], which is basically a refinement of Dublin
Core for art-image annotation

Let's for the moment assume we can represent AAT and VRA in
WebOnt. For effective search support we need to add domain knowledge
to this ontology.  This knowledge typically takes the form of
inter-slot constraints within the image description template. One
example:

  style/period = "Late Georgian"
  =>
  culture = "British" AND
  date.created = 1760, 1811

[Style/period, culture and date.created are all VRA data elements
defined as slots for our art-object description template.]

We could not define this constraint in RDFS and (a little to our
surprise)
we saw no way of expressing it in DAML+OIL either (we could have
misread the spec, we would be glad to be proven wrong). 

This type of semantical information is essential to show added value
of semantic annotations.

WEBONT REQUIREMENTS
possibility to define inter-slot constraints of a class



[1] The Art and Architecture Thesaurus
    http://shiva.pub.getty.edu.

[2] Visual Resources Association~Standards Committee.
    VRA core categories, version 3.0.
    Technical report, Visual Resources Association, July 2000.
    http://www.gsd.harvard.edu/~staffaw3/vra/vracore3.htm.


5. Conceptual Open Hypermedia
Contributor: Nick Gibbins, University of Southampton

TASK:
    Providing an overlay of hypertext links onto a corpus (a linkbase)
    in order to improve navigation by browsing through the corpus. 

EXAMPLE DOMAIN:
    Organisational and research documents generated by an academic
    institution.

TYPICAL USERS:

    1. novice user who needs further explanation of terms in documents
       (eg. information on people mentioned in documents)
    2. experienced user who knows rough location of desired
       information and is prepared to browse to find it
    3. experienced user annotating documents (associating terms in
       documents with ontology entities), so allowing new links to be
       created 

ONTOLOGY SAMPLES:

    The ontology is based in part on Dublin Core (describing
    bibliographic metadata), but also requires some representation of
    the content of the documents (departmental board minutes, grant
    applications, etc) in order to describe their contents (or rather,
    those entities which are referred to in their contents).

WEBONT REQUIREMENTS:

    referring to instances (eg. people) by means of their properties

    composition of relations
     - required to specify certain types of links (eg. a link to the
       homepage of the author of a document)

    ability to define lexical terms which commonly denote entities
     - for example, the lexical term "Nick Gibbins" is commonly used
       to refer to the person with email address nmg@ecs.soton.ac.uk
     - denotation of these terms is not necessarily static.
       for example, the lexical term "head of department" refers to
       different individuals based on the context in which it is used
       (publication date of the document in which the term appears) 

    provenance
     - no explicit author of links, but provenance of links is that of
       the facts from which they are constructed


-- 
A. Th. Schreiber, SWI, University of Amsterdam, Roetersstraat 15
NL-1018 WB Amsterdam, The Netherlands, Tel: +31 20 525 6793 
Fax: +31 20 525 6896; E-mail: schreiber@swi.psy.uva.nl
WWW: http://www.swi.psy.uva.nl/usr/Schreiber/home.html

Received on Friday, 14 December 2001 10:34:12 UTC