- From: Uschold, Michael F <michael.f.uschold@boeing.com>
- Date: Mon, 13 Sep 2004 09:21:10 -0700
- To: "Jim Hendler" <hendler@cs.umd.edu>, "Alistair Miles" <a.j.miles@rl.ac.uk>, "Dan Brickley" <danbri@w3.org>
- Cc: "Thomas Baker" <thomas.baker@bi.fhg.de>, "Thomas Baker" <thomas.baker@izb.fraunhofer.de>, "SW Best Practices" <public-swbp-wg@w3.org>
- Message-ID: <823043AB1B52784D97754D186877B6CF0426719B@xch-nw-12.nw.nos.boeing.com>
Jim These are very helpful comments clarifying things Comments inline with [MFU] -----Original Message----- From: Jim Hendler [mailto:hendler@cs.umd.edu] Sent: Friday, September 10, 2004 9:11 AM To: Alistair Miles; Dan Brickley Cc: Thomas Baker; Uschold, Michael F; Thomas Baker; SW Best Practices Subject: Re: [VM] Scoping Draft with questions to TF members $swbpd At 16:04 +0100 9/10/04, Alistair Miles wrote: >Just a comment, as I see it there are two options: > >(1) Generating an OWL ontology from a thesaurus. > >(2) Generating an RDF description of a thesaurus. > >(1) May often (although not always) be desirable. However, class/instance semantics are almost always implicit in a thesaurus (and can be inappropriate) => (1) usually requires significant manual labour. > >(2) Means using an RDF vocabulary that accurately expresses the explicit (and potentially fuzzy) semantics of a thesaurus (and hence can be AUTOMATICALLY captured from an existing serialisation of a thesaurus). > >I think (2) is the first goal, enabling us to generate lots of RDF thesauri very cheaply. Also N.B. there are many scenarios in which RDF based applications working on this type of data can provide important services and facilities to thesaurus users that are currently unavailable or expensive to produce. (I.e. thesaurus user community will get alot out of (2)). > >(1) Can be explored for various thesauri, weighing the cost of converting to OWL ontology against the potential utility. I expect that the outcome will differ from case to case, and will depend largely on the precise intended use of the thesaurus. > > > >Al Seems to me there is a confusion here, I see two separate things that work together just fine. [note: I ingore Skos for now, I will get back to it later] let's assume I have a thesaurus against one of the common thesaurus specs - Z39.19 or such. It specifies a lot of relationships between entries -- broader term, narrower term, related term, etc. There a number of ways I could turn this thesaurus into RDF/OWL -- a) The most common misconception is that we should just turn the thesaurus terms into the "equivalent" RDF/OWL terms - i.e. instead of "NT" use rdf:subclassOf and thus the thesaurus automagically becomes an ontology -- this is a BAD idea (in my opinion) because the semantics of class, subclass etc are NOT identical to the thesaurus terms, and the librarians would not use it (and would not invite us back). [MFU] This is a fairly good argument against this approach. Maybe we just need to make these points in the note and advise against this approach. b) the second approach is to turn the thesaurus into RDF data elements, using a standard namespace and have each thesaurus term become something like: <thes:entry rdf:resource="Thesaurus"> <thes:NarrowerTerm rdf:about="#RogetsThesaurus" /> <thes:relatedTerm rdf:about="#Dictionary" /> <thes:relatedTerm rdf:about="#AntonymList" /> ... </thes:entry> This latter would give you a machine readable and processable online thesaurus and each element would be separably and unambiguously nameable and HTTP GETable - IMO this is what digital librarians have been telling me they want and need [MFU] OK this sounds fine too. This sounds like very good news, because conceptually such a translation ought to be fairly trivial. Lots of details to attend to, but few if any technical challenges. Correct me if I'm wrong. c) Now consider the thesaurus name space (thes:) above -- this document needs to be an ontology about the thesaurus space - we could consider keeping that just in RDFS - but then essentially all one could state is domain and range restrictions. There are a lot of problems with this - for example, there have been proposed thesaurus standards where the thesaurus "backbone" had to be a tree - others (and most modern ones) it is a graph - RDF cannot distinguish these, but OWL can -- i.e. if a term is restricted to have only one Broader term, then it is a tree. If it can have multiple broader terms, then it is a graph - similarly, in some conceptions of thesauri the RT links are invertable, in others they aren't - in OWL we can state whether the term is symmetric (or even transitive) [MFU] Sounds like a very good idea. d) so having the thesaurus spec use OWL strikes me as a big win since it is more expressive, and since the RDFS version could be a proper subset if that was desired So, it seems to me, the ideal situation is to do b (make the thesaurus a set of RDF data) and d (have the thesaurus namespace document be in OWL. [MFU] Sounds good. Now, let's return to SKOS - Skos exactly gets (b) right (and is elegant in design - I think it is quite usable as is, and we have fooled around with turning some thesauri into skos data using PERL without much trouble) However, skos does not do D, and I think it suffers therefore -- in fact, the skos document, section 3.9 [1] talks about the semantics in ENGLISH in a way that could trivially be mapped to OWL - i.e. they say things like: This extension of the skos:broader/skos:narrower property pair may only be used to specify a class subsumption relationship between two concepts. which are easily expressed in OWL - and thus would be machine readable, inferences could be made off of the thesaurus data that cannot be made from the RDF Schema, etc. So, I would strongly advocate that BP could endorse the SKOS core design and help extend the skos core rdf schema [2] to OWL. -JH p.s. And, if someone wants to have a product the library community would really eat up, build a tool that would take as input a PDF file with a thesaurus in it using Z39.19, and output it as a machine readable thesaurus in RDF... [1] http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/#3.9 [2] http://www.w3.org/2004/02/skos/core.rdf -- Professor James Hendler http://www.cs.umd.edu/users/hendler Director, Semantic Web and Agent Technologies 301-405-2696 Maryland Information and Network Dynamics Lab. 301-405-6707 (Fax) Univ of Maryland, College Park, MD 20742
Received on Monday, 13 September 2004 16:21:55 UTC