RE: SWAD-E work on Thesaurus semantics & mapping from Butler, Mark on 2003-09-24 (www-rdf-dspace@w3.org from September 2003)

From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
Date: Wed, 24 Sep 2003 15:18:01 +0100
To: www-rdf-dspace@w3.org
Message-ID: <5E13A1874524D411A876006008CD059F066A1F3F@0-mail-1.hpl.hp.com>
Hi Eric,

Thanks for this pointer, I wasn't aware of this work. However at first I was
surprised at how they have done this, although it seems this is because I
have missed some of the complexity of thesauri. 

Thesauri have a lot in common with ontology and schema languages, for
example in principle "broader term" maps onto "superclass", "narrower term"
maps onto "subclass" and "synonym" maps to "equivalentTo". Also, just like
ontologies and schema languages, we can potentially use thesauri to generate
inferences e.g. if we search for an instance where property P1 has a value
of A and A is a synonym of B then we we can also return instances where
property P1 has a value B. Therefore just as we are considering having a
three level architecture e.g.

RDQL query layer
OWL / RDFS based inference layer
RDF store

if we start to use thesauri, it would be useful for the middle layer to do
inference over thesauri also. So my first suggestion would be to try to
convert thesauri into OWL. For a paper that takes this approach, although
admittedly it uses RDFS rather than OWL as it predates OWL work, see

From thesaurus to ontology, B. J. Wielinga and A. Th. Schreiber and
J.Wielemaker and J. A. C. Sandberg,
http://www.swi.psy.uva.nl/usr/Schreiber/papers/Wielinga01a.pdf

This would mean that we could then leverage standard OWL tools in order to
use the thesauri. This would mean for example Getty AAT users could develop
tools using OWL rather than from scratch which might be attractive.

However using thesauri in this way is difficult due to a problem that dogged
early AI mainly that people "overload" relationships like broader and
narrower terms with multiple meanings (this is similiar to the misuse of ISA
described in Drew McDermott's paper "Artificial Intelligence meets Natural
Stupidity"). Sometimes they correspond to subclass / superclass
relationships, but we have no guarantee of it. 

In addition, some of the relationships used in TIF e.g. has-preferred-term,
has-non-preferred-term, has-related-concept, has-inexact-equivalence,
has-partial-equivalence have no corresponding relationships in OWL.

So the questions I am trying to consider now are

i) what kind of processing is envisaged when using the TIF standard?
Specifically will the data be used for any kind of inference?

ii) identifying the exact problems of mapping between thesauri and
ontologies

(I've been pointed at this paper - "Semantic Problems of Thesaurus Mapping",
Martin Doerr http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Doerr/ - but not
read it yet)

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/

> -----Original Message-----
> From: Eric Miller [mailto:em@w3.org]
> Sent: 19 September 2003 18:04
> To: www-rdf-dspace@w3.org
> Subject: SWAD-E work on Thesaurus semantics & mapping
> 
> 
> 
> Related to our Simile discussions:
> 
> [[
> A proposed standard for a Thesaurus Interchange Format (TIF) for the
> semantic web is now online at:
> 
> http://www.w3c.rl.ac.uk/SWAD/thesaurus/tif/tif.html
> 
> We hope now for some feedback from users.
> ]]
> -- http://lists.w3.org/Archives/Public/public-esw/2003Aug/0000.html
> 
> the thread spaned months and can be picked up here
> -- http://lists.w3.org/Archives/Public/public-esw/2003Sep/
> 
> --
> eric miller                              http://www.w3.org/people/em/
> semantic web activity lead               http://www.w3.org/2001/sw/
> w3c world wide web consortium            http://www.w3.org/
>
Received on Wednesday, 24 September 2003 10:22:16 UTC