Modular RDF vocabs for thesaurus-like data

This is a document with some suggestions, intended to get the ball rolling
for a discussion on how to fit RDF representations of various flavours of
thesaurus into the world of ontologies, classification schemes etc.

 <<CIFoutline.rtf>> 
Suggestions for modular vocabularies for representing thesaurus-like data.



The Problem

A thesaurus contains two quite different kinds of information.  First,
information about how different terms are used to indicate things in our
heads.  In other words, information about the labelling of concepts.  I will
call this terminology information.  Second, information about how the things
in our head are related to each other.  In other words, o model of some
conceptual structure.  I will call this conceptual information.  A major
criticism of existing thesaurus standards is that these two type sof
information are muddlied, and hence recommendations are confusing (see NKOS
2003 presentation by Dagobert Soergel).  



The Solution

We model these two types of informaiton independently.  This greatly
improves clarity.  It would also allow us to attach terminology information
to all different kinds of conceptual structures (indexing schemes,
classification schemes, ontologies, topic maps etc.)

In RDF, I suggest we create several modular vocabularies.  For example, we
could have the following modules:

Terminology Module

Language Module

Conceptual Indexing Module 



Terminology Module
  
A simple set of properties allowing you to attach a preferred label and set
of alternate labels to any resource.

#RDF Terminology Module
#
############################################################################
###########

@prefix term: <???> .

term:pref		
	a		rdf:Property;
	rdfs:domain	rdf:Resource;
	rdfs:range	term:Term.

term:alt
	a		rdf:Property;
	rdfs:domain	rdf:Resource;
	rdfs:range	term:Term.

term:Term
	a		rdfs:Class.

term:value
	a		rdf:Property;
	rdfs:domain	term:Term;
	rdfs:range	rdfs:Literal.
	
############################################################################
###########

So if your conceptual structure is best modelled as an ontology for example
(lots of explicit isa, instanceof, partof and other specific type relations)
then build an OWL ontology and add the terminology layer using the above
properties.  If you want a traditional style thesaurus, use the Conceptual
Indexing Module (see below) and attach terminology layer.

#Some example use of terminology module
############################################################################
###########
@prefix example: <example>.

example:aResource
	term:pref	[	a		term:Term;
				term:value	"Cats";		];
	term:alt		[	a		term:Term;
				term:value	"Felines";	];

############################################################################
###########



Language Module

I also suggest a generic module for expressing that any resource is in a
specific language (maybe this already exists ???).

#RDF Language Module
#
############################################################################
###########
@prefix lang: <???>.

lang:lang
	a		rdf:Property;
	rdfs:domain	rdf:Resource;
	rdfs:range	lang:Language.

lang:Language
	a		rdfs:Class.

lang:en
	a		lang:Language.

lang:fr
	a		lang:Language.

#.....etc.

############################################################################
###########

This would give us a standard way to attach multilingual labels to an OWL
ontology, for example, or any conceptual structure which is sufficiently
independent of a linguistic context.

#Some example use of terminology and language module
############################################################################
###########

example:aResource
	term:pref	[	a		term:Term;
				term:value	"Cats";
				lang:lang	lang:en		];
	term:alt		[	a		term:Term;
				term:value	"Felines";
				lang:lang	lang:en		];
	term:pref	[	a		term:Term;
				term:value	"Chats";
				lang:lang	lang:fr		];

############################################################################
###########



Conceptual Indexing Module

This is the bit I haven't cracked.  Basically, this bit is supposed to allow
you to build a conceptual structure of the nature of that usually described
by a thesaurus, so identifying concepts and broader/narrower/related
associations between them.  But the emphasis is on its intended use.  This
is very important.  By calling it the conceptual indexing module, it is very
clear that any concept described by this vocabulary is intended for the
purpose of indexing of web resources.  This solves the problem of the word
"thesaurus" being used to mean actually a bunch of different things used for
quite different purposes (e.g. indexing thesauri, search aid thesauri,
automated classification thesauri etc.)

Here we must create a well-defined core, which can be extended by those who
wish to use custom concept associations.  I suggest the following for the
core:

#Conceptual Indexing Module core
#####################################

@prefix cif: <???>

cif:Concept
	a		rdfs:Class.

cif:id
	a		rdf:Property;
	rdfs:domain	cif:Concept;
	rdfs:range	rdfs:Literal.

cif:about
	a		rdf:Property;
	rdfs:domain	rdf:Resource;
	rdfs:range	cif:Concept;

#####################################

That is, every concept has a unique ID, and to index any web resource
against a concept, declare that the resource is 'about' some concept.


Beyond this things start to get a bit hazy.  We may want also to do
something like this, although I'm not sure:

#CIF descriptor property
#####################################

cif:descriptor
	a		rdf:Property;
	rdfs:domain	cif:Concept;
	rdfs:range	term:Term.

#####################################

That is, every concept must be linked to a term which is a noun or noun
phrase that uniquely identifies it, such as 'Banks (Financial
Institutions)'.  Having both a 'cif:descriptor' property and a 'term:pref'
(preferred-term) property simultaneously could be very confusing though.  

I think the following two properties are probably a good idea:
  
#CIF foundation properties
#####################################

cif:relation
	a		rdf:Property;
	rdfs:domain	cif:Concept;
	rdfs:range	cif:Concept.

cif:mapping
	a		rdf:Property;
	rdfs:domain	cif:Concept;
	rdfs:range	cif:Concept.

#####################################

The 'relation' property is the super-property of all properties linking
concepts within the same scheme.  Broader/narrower/related type properties
should be declared as sub-properties of this property.  The point is that,
however we or anyone extends this property, the precise semantics (ie.
meaning) of that property must be fully defined.  We probably want to put
some standard extensions in here, this we should definitely discuss!!!  I
have a heads up from the new british standards for thesauri which are under
development, and it looks like they use a 'broader' relation to subsume the
following relations 'broader-generic' (isa) 'broader-instantive'
(instanceof) 'broader-partitive' (part-of).  

Then we might want to do something like this:

#Properties to construct a hierarchy of concepts with no semantic
implications
#####################################

cif:parent
	a			rdf:Property;
	rdfs:subPropertyOf	cif:relation.

cif:child
	a			rdf:Property;
	rdfs:subPropertyOf	cif:relation.

cif:friend
	a			rdf:Property;
	rdfs:subPropertyOf	cif:relation.

#####################################

to ensure we can catch all types of data, in the case where semantics are
poorly defined or standards are not consistently adhered to.  

The 'mapping' property is the super-property of all properties linking
concepts from different schemes.  Here there is a basis for defining some
clearly understood properties.  If we take each concept from a scheme as
standing for the set of resources which are indexed against it, then we can
compare concepts from different schemes using set operations.  We could use
something like the following vocabulary to do this:

#CIF Properties and classes to define mappings between Concepts from
different sources
######################################

cif:sameAs
	a			rdf:Property;
	rdfs:subPropertyOf	cif:mapping.

cif:overlapsWith
	a			rdf:Property;
	rdfs:subPropertyOf	cif:mapping.

cif:includes
	a			rdf:Property;
	rdfs:subPropertyOf	cif:mapping.

cif:includedBy
	a			rdf:Property;
	rdfs:subPropertyOf	cif:mapping.

cif:disjointWith
	a			rdf:Property;
	rdfs:subPropertyOf	cif:mapping.

cif:complementOf
	a			rdf:Property;
	rdfs:subPropertyOf	cif:mapping.

cif:ConceptCombination
	a			rdfs:Class;
	rdfs:subClassOf		cif:Concept.

cif:Union
	a			rdfs:Class;
	rdfs:subClassOf		cif:ConceptCombination.

cif:Intersection
	a			rdfs:Class;
	rdfs:subClassOf		cif:ConceptCombination.

cif:Exclusion
	a			rdfs:Class;
	rdfs:subClassOf		cif:ConceptCombination.

cif:ofConcept
	a		rdf:Property;
	rdfs:domain	cif:ConceptCombination;
	rdfs:range	cif:Concept.

cif:excluding
	a		rdf:Property;
	rdfs:domain	cif:Exclusion;
	rdfs:range	cif:Concept.

######################################

This goes beyond the "equivalence" relations of the multilngual ISO
standard, to clearly define the full set of possible set operations.

#Example data - concept mappings
######################################

@prefix example2: <example2> .

example:Politicians
	a			cif:Concept;
	cif:includes		[	a		cif:Union;
					ofConcept	example2:MPs;
					ofConcept	example2:MEPs;
					ofConcept	example2:Councillors
];
	cif:includedBy		example2:PublicEmployees;
.

######################################

In the above example I have shown a mapping from a concept in a source
scheme to both broader and narrower sets in the target scheme.  This
practise allows you to guarantee recall in the case were a user wants for
example documents about Politicians, and in the case were the user requests
documents NOT about politicians (see Doerr paper on concept mapping).

There's also some other things which may be worth considering, mainly
inspired by the linguisitics people, which are properties linking a concept
to an explanation, a definition, a context, an example, a picture, any other
multimedia object etc.  In other words, links from a concept to lots of
things which can help identify what that concept is referring to (thesaurus
scope-notes fit in here too).

This document has not considered how to do things like expressing the
grouping of concepts into facets (such as 'objects', 'activities') or node
groups (such as 'paintings by period' and 'paintings by artist'').

The last thing is managing change and evolution.  We probably want to
consider some way of expressing that a concept is deprecated and has been
replaced by a new one, or has been modified in some way, and other such
things.  






Alistair Miles

CCLRC - Rutherford Appleton Laboratory
Building R1 Room 1.60
Fermi Avenue
Chilton
Didcot
Oxfordshire OX11 0QX
United Kingdom

Email:        a.j.miles@rl.ac.uk
Telephone: +44 (0)1235 445440

Received on Friday, 19 September 2003 13:03:57 UTC