- From: Jim Hendler <hendler@cs.umd.edu>
- Date: Fri, 10 Sep 2004 12:11:17 -0400
- To: Alistair Miles <a.j.miles@rl.ac.uk>, Dan Brickley <danbri@w3.org>
- Cc: Thomas Baker <thomas.baker@bi.fhg.de>, "Uschold, Michael F" <michael.f.uschold@boeing.com>, Thomas Baker <thomas.baker@izb.fraunhofer.de>, SW Best Practices <public-swbp-wg@w3.org>
- Message-Id: <p0611040bbd677bb875ee@[10.0.1.4]>
At 16:04 +0100 9/10/04, Alistair Miles wrote:
>Just a comment, as I see it there are two options:
>
>(1) Generating an OWL ontology from a thesaurus.
>
>(2) Generating an RDF description of a thesaurus.
>
>(1) May often (although not always) be desirable. However,
>class/instance semantics are almost always implicit in a thesaurus
>(and can be inappropriate) => (1) usually requires significant
>manual labour.
>
>(2) Means using an RDF vocabulary that accurately expresses the
>explicit (and potentially fuzzy) semantics of a thesaurus (and hence
>can be AUTOMATICALLY captured from an existing serialisation of a
>thesaurus).
>
>I think (2) is the first goal, enabling us to generate lots of RDF
>thesauri very cheaply. Also N.B. there are many scenarios in which
>RDF based applications working on this type of data can provide
>important services and facilities to thesaurus users that are
>currently unavailable or expensive to produce. (I.e. thesaurus user
>community will get alot out of (2)).
>
>(1) Can be explored for various thesauri, weighing the cost of
>converting to OWL ontology against the potential utility. I expect
>that the outcome will differ from case to case, and will depend
>largely on the precise intended use of the thesaurus.
>
>
>
>Al
Seems to me there is a confusion here, I see two separate things that
work together just fine. [note: I ingore Skos for now, I will get
back to it later]
let's assume I have a thesaurus against one of the common thesaurus
specs - Z39.19 or such. It specifies a lot of relationships between
entries -- broader term, narrower term, related term, etc. There a
number of ways I could turn this thesaurus into RDF/OWL --
a) The most common misconception is that we should just turn the
thesaurus terms into the "equivalent" RDF/OWL terms - i.e. instead of
"NT" use rdf:subclassOf and thus the thesaurus automagically becomes
an ontology -- this is a BAD idea (in my opinion) because the
semantics of class, subclass etc are NOT identical to the thesaurus
terms, and the librarians would not use it (and would not invite us
back).
b) the second approach is to turn the thesaurus into RDF data
elements, using a standard namespace and have each thesaurus term
become something like:
<thes:entry rdf:resource="Thesaurus">
<thes:NarrowerTerm rdf:about="#RogetsThesaurus" />
<thes:relatedTerm rdf:about="#Dictionary" />
<thes:relatedTerm rdf:about="#AntonymList" />
...
</thes:entry>
This latter would give you a machine readable and processable online
thesaurus and each element would be separably and unambiguously
nameable and HTTP GETable - IMO this is what digital librarians have
been telling me they want and need
c) Now consider the thesaurus name space (thes:) above -- this
document needs to be an ontology about the thesaurus space - we could
consider keeping that just in RDFS - but then essentially all one
could state is domain and range restrictions. There are a lot of
problems with this - for example, there have been proposed thesaurus
standards where the thesaurus "backbone" had to be a tree - others
(and most modern ones) it is a graph - RDF cannot distinguish these,
but OWL can -- i.e. if a term is restricted to have only one Broader
term, then it is a tree. If it can have multiple broader terms, then
it is a graph - similarly, in some conceptions of thesauri the RT
links are invertable, in others they aren't - in OWL we can state
whether the term is symmetric (or even transitive)
d) so having the thesaurus spec use OWL strikes me as a big win since
it is more expressive, and since the RDFS version could be a proper
subset if that was desired
So, it seems to me, the ideal situation is to do b (make the
thesaurus a set of RDF data) and d (have the thesaurus namespace
document be in OWL.
Now, let's return to SKOS - Skos exactly gets (b) right (and is
elegant in design - I think it is quite usable as is, and we have
fooled around with turning some thesauri into skos data using PERL
without much trouble)
However, skos does not do D, and I think it suffers therefore -- in
fact, the skos document, section 3.9 [1] talks about the semantics in
ENGLISH in a way that could trivially be mapped to OWL - i.e. they
say things like:
This extension of the skos:broader/skos:narrower property pair may
only be used to specify a class subsumption relationship between two
concepts.
which are easily expressed in OWL - and thus would be machine
readable, inferences could be made off of the thesaurus data that
cannot be made from the RDF Schema, etc.
So, I would strongly advocate that BP could endorse the SKOS core
design and help extend the skos core rdf schema [2] to OWL.
-JH
p.s. And, if someone wants to have a product the library community
would really eat up, build a tool that would take as input a PDF file
with a thesaurus in it using Z39.19, and output it as a machine
readable thesaurus in RDF...
[1] http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/#3.9
[2] http://www.w3.org/2004/02/skos/core.rdf
--
Professor James Hendler http://www.cs.umd.edu/users/hendler
Director, Semantic Web and Agent Technologies 301-405-2696
Maryland Information and Network Dynamics Lab. 301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742
Received on Friday, 10 September 2004 16:12:42 UTC