Re: [VM] Scoping Draft with questions to TF members $swbpd from Jim Hendler on 2004-09-10 (public-swbp-wg@w3.org from September 2004)

From: Jim Hendler <hendler@cs.umd.edu>
Date: Fri, 10 Sep 2004 12:11:17 -0400
To: Alistair Miles <a.j.miles@rl.ac.uk>, Dan Brickley <danbri@w3.org>
Cc: Thomas Baker <thomas.baker@bi.fhg.de>, "Uschold, Michael F" <michael.f.uschold@boeing.com>, Thomas Baker <thomas.baker@izb.fraunhofer.de>, SW Best Practices <public-swbp-wg@w3.org>
Message-Id: <p0611040bbd677bb875ee@[10.0.1.4]>
At 16:04 +0100 9/10/04, Alistair Miles wrote:
>Just a comment, as I see it there are two options:
>
>(1) Generating an OWL ontology from a thesaurus.
>
>(2) Generating an RDF description of a thesaurus.
>
>(1) May often (although not always) be desirable.  However, 
>class/instance semantics are almost always implicit in a thesaurus 
>(and can be inappropriate) => (1) usually requires significant 
>manual labour.
>
>(2) Means using an RDF vocabulary that accurately expresses the 
>explicit (and potentially fuzzy) semantics of a thesaurus (and hence 
>can be AUTOMATICALLY captured from an existing serialisation of a 
>thesaurus).
>
>I think (2) is the first goal, enabling us to generate lots of RDF 
>thesauri very cheaply.  Also N.B. there are many scenarios in which 
>RDF based applications working on this type of data can provide 
>important services and facilities to thesaurus users that are 
>currently unavailable or expensive to produce.  (I.e. thesaurus user 
>community will get alot out of (2)).
>
>(1) Can be explored for various thesauri, weighing the cost of 
>converting to OWL ontology against the potential utility.  I expect 
>that the outcome will differ from case to case, and will depend 
>largely on the precise intended use of the thesaurus.
>
>
>
>Al

Seems to me there is a confusion here, I see two separate things that 
work together just fine. [note: I ingore Skos for now, I will get 
back to it later]

let's assume I have a thesaurus against one of the common thesaurus 
specs - Z39.19 or such.  It specifies a lot of relationships between 
entries -- broader term, narrower term, related term, etc.   There a 
number of ways I could turn this thesaurus into RDF/OWL --

a) The most common misconception is that we should just turn the 
thesaurus terms into the "equivalent" RDF/OWL terms - i.e. instead of 
"NT" use rdf:subclassOf and thus the thesaurus automagically becomes 
an ontology -- this is a BAD idea (in my opinion) because the 
semantics of class, subclass etc are NOT identical to the thesaurus 
terms, and the librarians would not use it (and would not invite us 
back).

b) the second approach is to turn the thesaurus into RDF data 
elements, using a standard namespace and have each thesaurus term 
become something like:

<thes:entry rdf:resource="Thesaurus">
    <thes:NarrowerTerm rdf:about="#RogetsThesaurus" />
    <thes:relatedTerm rdf:about="#Dictionary" />
    <thes:relatedTerm rdf:about="#AntonymList" />
    ...
</thes:entry>

This latter would give you a machine readable and processable online 
thesaurus and each element would be separably and unambiguously 
nameable and HTTP GETable - IMO this is what digital librarians have 
been telling me they want and need

c) Now consider the thesaurus name space (thes:) above -- this 
document needs to be an ontology about the thesaurus space - we could 
consider keeping that just in RDFS - but then essentially all one 
could state is domain and range restrictions.  There are a lot of 
problems with this - for example, there have been proposed thesaurus 
standards where the thesaurus "backbone" had to be a tree - others 
(and most modern ones) it is a graph - RDF cannot distinguish these, 
but OWL can -- i.e. if  a term is restricted to have only one Broader 
term, then it is a tree.  If it can have multiple broader terms, then 
it is a graph  - similarly, in some conceptions of thesauri the RT 
links are invertable, in others they aren't - in OWL we can state 
whether the term is symmetric (or even transitive)

d) so having the thesaurus spec use OWL strikes me as a big win since 
it is more expressive, and since the RDFS version could be a proper 
subset if that was desired

So, it seems to me, the ideal situation is to do b (make the 
thesaurus a set of RDF data) and d (have the thesaurus namespace 
document be in OWL.

Now, let's return to SKOS - Skos exactly gets (b) right (and is 
elegant in design - I think it is quite usable as is, and we have 
fooled around with turning some thesauri into skos data using PERL 
without much trouble)
  However, skos does not do D, and I think it suffers therefore -- in 
fact, the skos document, section 3.9 [1] talks about the semantics in 
ENGLISH in a way that could trivially be mapped to OWL - i.e. they 
say things like:

This extension of the skos:broader/skos:narrower property pair may 
only be used to specify a class subsumption relationship between two 
concepts.

which are easily expressed in OWL - and thus would be machine 
readable, inferences could be made off of the thesaurus data that 
cannot be made from the RDF Schema, etc.

So, I would strongly advocate that BP could endorse the SKOS core 
design and help extend the skos core  rdf schema [2] to OWL.
  -JH
p.s. And, if someone wants to have a product the library community 
would really eat up, build a tool that would take as input a PDF file 
with a thesaurus in it using Z39.19, and output it as a machine 
readable thesaurus in RDF...



[1] http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/#3.9
[2] http://www.w3.org/2004/02/skos/core.rdf
-- 
Professor James Hendler			  http://www.cs.umd.edu/users/hendler 
Director, Semantic Web and Agent Technologies	  301-405-2696
Maryland Information and Network Dynamics Lab.	  301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742
Received on Friday, 10 September 2004 16:12:42 UTC