W3C home > Mailing lists > Public > public-swbp-wg@w3.org > January to March 2004

[OPEN] and/or [PORT] : a practical question

From: Bernard Vatant <bernard.vatant@mondeca.com>
Date: Fri, 19 Mar 2004 23:02:38 +0100
To: "SWBPD" <public-swbp-wg@w3.org>
Message-ID: <GOEIKOOAMJONEFCANOKCCEHIDJAA.bernard.vatant@mondeca.com>

This is a practical question that we have often met in Mondeca. The message below comes
from a partner in an European project, developing linguistic tools to generate queries on
a semantic knowledge base.

To sum up the issue, the question is how to express that the subject (dc:subject) of a
document is a concept used as a class in an ontology, e.g "Phd_Theses". My view is that if
you don't want to be in OWL-Full, the only way is to make distinct the concept used as
class and the concept used as document subject (defined as instance in a thesaurus).
The argument against that is that the search engine could leverage the ontology
subsumptions to expand queries e.g. from "find documents about publications" to "find
documents about PhD Theses" ... more arguments below in Patrizia Paggio message.

Best practice for that, folks ?

Bernard Vatant
Senior Consultant
Knowledge Engineering
Mondeca - www.mondeca.com

-----Message d'origine-----
De : Patrizia Paggio [mailto:patrizia@cst.dk]
Envoye : vendredi 19 mars 2004 11:28
A : Bernard Vatant
Cc : Lina Henriksen; CST
Objet : Re: Federated questions

Dear Bernard
since you ask directly for my opinion, here it comes :-) .

I think I'm sceptical about the so-called thesaurus solution probably because I don't
totally understand why it is smart (alas, in spite of all these email exchanges!).
Let me try and explain the way I see things without getting into details with OWL -Full.
To take the Webpage on PhD theses, I think we wish to be able to express the fact that the
Webpage is also about dissertations, and about publications in general, as predicted by
the isa structure: Publication <= Dissertation <= PhD Thesis. This means in my opinion
that if the user asks for a Webpage on Publications, the page on PhD Theses should be
among the hits. In general, I think it is fair to say that if a document is about a
certain university-relevant concept in our ontology, it is also at the same time about the
concepts that subsume the concept under consideration.
Now, if this is true, it seems to me that if we cannot (or do not want to) allow the
Subject class to subsume classes in the ontology in a direct fashion, well then we need to
replicate the whole ontology (that is excluding instances) and call it a thesaurus. If
this is smart (and possible) - I suppose that's what we should do.
As far as the linguistic implementation is concerned, it doesn't make any sense to me to
have two versions of the ontology, one of which is used to express subclasses of the
Subject concept. As a matter of fact, we couln't even do it because of name clashes. So we
would ignore the thesaurus if the thesaurus is the same as (or fragments of) the ontology.
By the way, what is a good definition of a thesaurus?


Patrizia Paggio

Senior Researcher		phone: +45 3532 9072
Center for Sprogteknologi	fax:   +45 3532 9089
Njalsgade 80			email: patrizia@cst.dk
2300-DK CPH S			www.cst.dk/patrizia

LREC04 Workshop on Multimodal Corpora

LREC04 OntoLex 2004
Received on Friday, 19 March 2004 17:09:26 EST

This archive was generated by hypermail pre-2.1.9 : Friday, 19 March 2004 17:09:29 EST