RE: [VM] Scoping Draft with questions to TF members $swbpd from Miles, AJ (Alistair) on 2004-09-13 (public-swbp-wg@w3.org from September 2004)

From: Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk>
Date: Mon, 13 Sep 2004 18:36:29 +0100
To: SW Best Practices <public-swbp-wg@w3.org>
Message-ID: <350DC7048372D31197F200902773DF4C05E50BF4@exchange11.rl.ac.uk>
+1 on all Jim said.

Just to clarify, by ...

> (1) Generating an OWL ontology from a thesaurus.

... I meant taking a thesaurus as a starting point for building an OWL
ontology by hand (i.e. picking through the concepts and the hierarchical
relations, and asserting rdfs:subClassOf, rdf:type, or something else as
appropriate - essentially converting a thesaurus to an ontology), and not
...

> ... turn the thesaurus 
> terms into the "equivalent" RDF/OWL terms - i.e. instead of "NT" use 
> rdf:subclassOf and thus the thesaurus automagically becomes an ontology --


... which I agree is a bad idea because of what Jim said.

Al.

--- 
Alistair Miles 
Research Associate 
CCLRC - Rutherford Appleton Laboratory 
Building R1 Room 1.60 
Fermi Avenue 
Chilton 
Didcot 
Oxfordshire OX11 0QX 
United Kingdom 
Email:        a.j.miles@rl.ac.uk 
Tel: +44 (0)1235 445440 
-----Original Message-----
From: Uschold, Michael F [mailto:michael.f.uschold@boeing.com]
Sent: 13 September 2004 17:21
To: Jim Hendler; Alistair Miles; Dan Brickley
Cc: Thomas Baker; Thomas Baker; SW Best Practices
Subject: RE: [VM] Scoping Draft with questions to TF members $swbpd


Jim

These are very helpful comments clarifying things

Comments inline with [MFU]

-----Original Message-----
From: Jim Hendler [mailto:hendler@cs.umd.edu] 
Sent: Friday, September 10, 2004 9:11 AM
To: Alistair Miles; Dan Brickley
Cc: Thomas Baker; Uschold, Michael F; Thomas Baker; SW Best Practices
Subject: Re: [VM] Scoping Draft with questions to TF members $swbpd

At 16:04 +0100 9/10/04, Alistair Miles wrote:
>Just a comment, as I see it there are two options:
>
>(1) Generating an OWL ontology from a thesaurus.
>
>(2) Generating an RDF description of a thesaurus.
>
>(1) May often (although not always) be desirable.  However, class/instance
semantics are almost always implicit in a thesaurus (and can be
inappropriate) => (1) usually requires significant manual labour.
>
>(2) Means using an RDF vocabulary that accurately expresses the explicit
(and potentially fuzzy) semantics of a thesaurus (and hence can be
AUTOMATICALLY captured from an existing serialisation of a thesaurus).
>
>I think (2) is the first goal, enabling us to generate lots of RDF thesauri
very cheaply.  Also N.B. there are many scenarios in which RDF based
applications working on this type of data can provide important services and
facilities to thesaurus users that are currently unavailable or expensive to
produce.  (I.e. thesaurus user community will get alot out of (2)).
>
>(1) Can be explored for various thesauri, weighing the cost of converting
to OWL ontology against the potential utility.  I expect that the outcome
will differ from case to case, and will depend largely on the precise
intended use of the thesaurus.
>
>
>
>Al

Seems to me there is a confusion here, I see two separate things that work
together just fine. [note: I ingore Skos for now, I will get back to it
later]

let's assume I have a thesaurus against one of the common thesaurus specs -
Z39.19 or such.  It specifies a lot of relationships between entries --
broader term, narrower term, related term, etc.   There a number of ways I
could turn this thesaurus into RDF/OWL --

a) The most common misconception is that we should just turn the thesaurus
terms into the "equivalent" RDF/OWL terms - i.e. instead of "NT" use
rdf:subclassOf and thus the thesaurus automagically becomes an ontology --
this is a BAD idea (in my opinion) because the semantics of class, subclass
etc are NOT identical to the thesaurus terms, and the librarians would not
use it (and would not invite us back).
[MFU] This is a fairly good argument against this approach. Maybe we just
need to make these points in the note and advise against this approach.
b) the second approach is to turn the thesaurus into RDF data elements,
using a standard namespace and have each thesaurus term become something
like:

<thes:entry rdf:resource="Thesaurus">
   <thes:NarrowerTerm rdf:about="#RogetsThesaurus" />
   <thes:relatedTerm rdf:about="#Dictionary" />
   <thes:relatedTerm rdf:about="#AntonymList" />
   ...
</thes:entry>

This latter would give you a machine readable and processable online
thesaurus and each element would be separably and unambiguously nameable and
HTTP GETable - IMO this is what digital librarians have been telling me they
want and need
[MFU] OK this sounds fine too.  This sounds like very good news, because
conceptually such a translation ought to be fairly trivial.  Lots of details
to attend to, but few if any technical challenges. Correct me if I'm wrong.


c) Now consider the thesaurus name space (thes:) above -- this document
needs to be an ontology about the thesaurus space - we could consider
keeping that just in RDFS - but then essentially all one could state is
domain and range restrictions.  There are a lot of problems with this - for
example, there have been proposed thesaurus standards where the thesaurus
"backbone" had to be a tree - others (and most modern ones) it is a graph -
RDF cannot distinguish these, but OWL can -- i.e. if  a term is restricted
to have only one Broader term, then it is a tree.  If it can have multiple
broader terms, then it is a graph  - similarly, in some conceptions of
thesauri the RT links are invertable, in others they aren't - in OWL we can
state whether the term is symmetric (or even transitive) 
[MFU] Sounds like a very good idea.

d) so having the thesaurus spec use OWL strikes me as a big win since it is
more expressive, and since the RDFS version could be a proper subset if that
was desired

So, it seems to me, the ideal situation is to do b (make the thesaurus a set
of RDF data) and d (have the thesaurus namespace document be in OWL.
[MFU] Sounds good.

Now, let's return to SKOS - Skos exactly gets (b) right (and is elegant in
design - I think it is quite usable as is, and we have fooled around with
turning some thesauri into skos data using PERL without much trouble)
 However, skos does not do D, and I think it suffers therefore -- in fact,
the skos document, section 3.9 [1] talks about the semantics in ENGLISH in a
way that could trivially be mapped to OWL - i.e. they say things like:

This extension of the skos:broader/skos:narrower property pair may only be
used to specify a class subsumption relationship between two concepts.

which are easily expressed in OWL - and thus would be machine readable,
inferences could be made off of the thesaurus data that cannot be made from
the RDF Schema, etc.

So, I would strongly advocate that BP could endorse the SKOS core design and
help extend the skos core  rdf schema [2] to OWL.
 -JH
p.s. And, if someone wants to have a product the library community would
really eat up, build a tool that would take as input a PDF file with a
thesaurus in it using Z39.19, and output it as a machine readable thesaurus
in RDF...



[1] http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/#3.9
[2] http://www.w3.org/2004/02/skos/core.rdf
-- 
Professor James Hendler
http://www.cs.umd.edu/users/hendler 
Director, Semantic Web and Agent Technologies       301-405-2696
Maryland Information and Network Dynamics Lab.      301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742
Received on Tuesday, 14 September 2004 11:02:36 UTC