RE: [VM] Scoping Draft with questions to TF members $swbpd

Jim

 

These are very helpful comments clarifying things

 

Comments inline with [MFU]

 

-----Original Message-----
From: Jim Hendler [mailto:hendler@cs.umd.edu] 
Sent: Friday, September 10, 2004 9:11 AM
To: Alistair Miles; Dan Brickley
Cc: Thomas Baker; Uschold, Michael F; Thomas Baker; SW Best Practices
Subject: Re: [VM] Scoping Draft with questions to TF members $swbpd

 

At 16:04 +0100 9/10/04, Alistair Miles wrote:

>Just a comment, as I see it there are two options:
>
>(1) Generating an OWL ontology from a thesaurus.
>
>(2) Generating an RDF description of a thesaurus.
>
>(1) May often (although not always) be desirable.  However,
class/instance semantics are almost always implicit in a thesaurus (and
can be inappropriate) => (1) usually requires significant manual labour.
>
>(2) Means using an RDF vocabulary that accurately expresses the
explicit (and potentially fuzzy) semantics of a thesaurus (and hence can
be AUTOMATICALLY captured from an existing serialisation of a
thesaurus).
>
>I think (2) is the first goal, enabling us to generate lots of RDF
thesauri very cheaply.  Also N.B. there are many scenarios in which RDF
based applications working on this type of data can provide important
services and facilities to thesaurus users that are currently
unavailable or expensive to produce.  (I.e. thesaurus user community
will get alot out of (2)).
>
>(1) Can be explored for various thesauri, weighing the cost of
converting to OWL ontology against the potential utility.  I expect that
the outcome will differ from case to case, and will depend largely on
the precise intended use of the thesaurus.
>
>
>

>Al

 

Seems to me there is a confusion here, I see two separate things that
work together just fine. [note: I ingore Skos for now, I will get back
to it later]

 

let's assume I have a thesaurus against one of the common thesaurus
specs - Z39.19 or such.  It specifies a lot of relationships between
entries -- broader term, narrower term, related term, etc.   There a
number of ways I could turn this thesaurus into RDF/OWL --

 

a) The most common misconception is that we should just turn the
thesaurus terms into the "equivalent" RDF/OWL terms - i.e. instead of
"NT" use rdf:subclassOf and thus the thesaurus automagically becomes an
ontology -- this is a BAD idea (in my opinion) because the semantics of
class, subclass etc are NOT identical to the thesaurus terms, and the
librarians would not use it (and would not invite us back).

[MFU] This is a fairly good argument against this approach. Maybe we
just need to make these points in the note and advise against this
approach.

b) the second approach is to turn the thesaurus into RDF data elements,
using a standard namespace and have each thesaurus term become something
like:

 

<thes:entry rdf:resource="Thesaurus">

   <thes:NarrowerTerm rdf:about="#RogetsThesaurus" />

   <thes:relatedTerm rdf:about="#Dictionary" />

   <thes:relatedTerm rdf:about="#AntonymList" />

   ...

</thes:entry>

 

This latter would give you a machine readable and processable online
thesaurus and each element would be separably and unambiguously nameable
and HTTP GETable - IMO this is what digital librarians have been telling
me they want and need

[MFU] OK this sounds fine too.  This sounds like very good news, because
conceptually such a translation ought to be fairly trivial.  Lots of
details to attend to, but few if any technical challenges. Correct me if
I'm wrong.

 

 

c) Now consider the thesaurus name space (thes:) above -- this document
needs to be an ontology about the thesaurus space - we could consider
keeping that just in RDFS - but then essentially all one could state is
domain and range restrictions.  There are a lot of problems with this -
for example, there have been proposed thesaurus standards where the
thesaurus "backbone" had to be a tree - others (and most modern ones) it
is a graph - RDF cannot distinguish these, but OWL can -- i.e. if  a
term is restricted to have only one Broader term, then it is a tree.  If
it can have multiple broader terms, then it is a graph  - similarly, in
some conceptions of thesauri the RT links are invertable, in others they
aren't - in OWL we can state whether the term is symmetric (or even
transitive) 

[MFU] Sounds like a very good idea.

 

d) so having the thesaurus spec use OWL strikes me as a big win since it
is more expressive, and since the RDFS version could be a proper subset
if that was desired

 

So, it seems to me, the ideal situation is to do b (make the thesaurus a
set of RDF data) and d (have the thesaurus namespace document be in OWL.

[MFU] Sounds good.

 

Now, let's return to SKOS - Skos exactly gets (b) right (and is elegant
in design - I think it is quite usable as is, and we have fooled around
with turning some thesauri into skos data using PERL without much
trouble)

 However, skos does not do D, and I think it suffers therefore -- in
fact, the skos document, section 3.9 [1] talks about the semantics in
ENGLISH in a way that could trivially be mapped to OWL - i.e. they say
things like:

 

This extension of the skos:broader/skos:narrower property pair may only
be used to specify a class subsumption relationship between two
concepts.

 

which are easily expressed in OWL - and thus would be machine readable,
inferences could be made off of the thesaurus data that cannot be made
from the RDF Schema, etc.

 

So, I would strongly advocate that BP could endorse the SKOS core design
and help extend the skos core  rdf schema [2] to OWL.

 -JH

p.s. And, if someone wants to have a product the library community would
really eat up, build a tool that would take as input a PDF file with a
thesaurus in it using Z39.19, and output it as a machine readable
thesaurus in RDF...

 

 

 

[1] http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/#3.9

[2] http://www.w3.org/2004/02/skos/core.rdf

-- 

Professor James Hendler
http://www.cs.umd.edu/users/hendler 
Director, Semantic Web and Agent Technologies       301-405-2696
Maryland Information and Network Dynamics Lab.      301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742    
   

Received on Monday, 13 September 2004 16:21:55 UTC