Re: WordNet Task Force - work outline from Aldo Gangemi on 2004-03-26 (public-swbp-wg@w3.org from March 2004)

From: Aldo Gangemi <a.gangemi@istc.cnr.it>
Date: Sat, 27 Mar 2004 00:59:24 +0100
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Cc: public-swbp-wg@w3.org
Message-Id: <p06002007bc8a4074a60b@[212.34.219.175]>
At 12:37 +0000 26-03-2004, Jeremy Carroll wrote:
>Thanks Aldo for your reply, thanks particularly for pointing me at this:
>
>
>>  1) Datatype remapping of WordNet, in order to arrive at a preliminary
>>  namespace  (what are the classes, the individuals, the properties, and
>>  how many levels are currently collapsed in wordnets?).
>
>which does seem to address many (possibly all) of my concerns.
>
>If the WordNet TF has a first phase that completes this task, and 
>succeeds in making it useful (more useful than it being done outside 
>this WG), by:
>
>- successfully persuading say Princeton to use it to publish an OWL 
>version of WordNet
>- successfully getting consensus from the other WordNet mappers that 
>this is a good point to start, so that they use the same schema and 
>terminology at this meta-level
>- successfully persuade tool developers that this schema is worth 
>having some built-in support for
>
>then I would be very happy. For my part, I would do my best to 
>persuade my colleagues that having some WordNet support in Jena, 
>following the TF's suggestions, would be valuable.
>
>I think that first phase looks quite hard, you seem to see it as easy.

I see the logical part as easy, but persuasion and consensus are more 
a matter of social technique than of logical technique ...

>Technically you may well be right that it is addressed by:
>
>>  Namespace(rdf   = <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
>>  Namespace(xsd   = <http://www.w3.org/2001/XMLSchema#>)
>>  Namespace(rdfs  = <http://www.w3.org/2000/01/rdf-schema#>)
>>  Namespace(owl   = <http://www.w3.org/2002/07/owl#>)
>>  Namespace(a     = <http://www.w3.org/2004/03/WordNetSchema#>)
>>
>>  Ontology( <http://212.34.219.175/WN_Schema.owl.rdf>
>>
>>   ObjectProperty(a:Hyperonym)
>>   ObjectProperty(a:Hyponym
>>    inverseOf(a:Hyperonym)
>>    domain(a:Synset)
>>    range(a:Synset))
>>   ObjectProperty(a:Sense
>>    domain(a:Word)
>>    range(a:Synset))
>>   ObjectProperty(a:Synonym Symmetric
>>    domain(a:Word)
>>    range(a:Word))
>>
>>   Class(a:Synset partial
>>    restriction(a:Hyperonym someValuesFrom (a:Synset)))
>>   Class(a:Word partial
>>    restriction(a:Sense someValuesFrom (a:Synset)))
>>  )
>>
>
>however, the technical part is only the beginning of the task.

Agreed

>(I haven't reviewed this suggested ontology)
>
>I did note two related confusions between our postings:
>- WordNet or wordnets?
>   I see the main reason for having a wordnet task force is that 
>there is already a deployed and widely used resource that is 
>appropriate and useful for the Semantic Web; WordNet, in English, 
>from princeton. This has specific synsets that may or may not be 
>"best" depending on your application or linguistic needs; but it is 
>*deployed* and used.
>  You seem to see the purpose of the task force to be about 
>encouraging better wordnets, and alternative synsets etc. that might 
>be more useful as an upper ontology (I am not an ontologist, sorry 
>if I am misusing the term)

I think we can suggest best practices in various forms. The first one 
is providing a schema for wordnets, and the first obvious case is 
Princeton WordNet. Other best practices can suggest ways to use 
wordnets as ontologies, as linkages between ontologies and lexica, 
etc.

>- methodology or triples?
>  You seem to see the key deliverables of the TF as methodological 
>advice. (How to build better wordnets) I see the key deliverables of 
>the TF as triples (here is some deployed knowledge that people have 
>found useful, here is that knowledge expressed in RDF/OWL)

I see both

>
>Given that princeton are a key partner in the success of the first 
>phase, I would be nervous about us being too keen to denigrate what 
>they have done.

We do not denigrate anyone! Princeton WordNet is good and 
accomplishes the tasks for which it has been created. Being a 
resource for the Semantic Web was far beyond the imagination of 
George Miller when he conceived it ... anyway, Christiane Fellbaum 
has already expressed interest in our TF activities (communicated as 
a list of issues similar to the one that I have sent).

>Of course, any work on the scale of wordnet is not perfect, and will 
>not be ideally suited for any specific purpose, since it has to be a 
>trade off between different needs. But it exists. It is useful. An 
>'ontological' argument about best practices is that a necessary part 
>of being a best practice is being deployed and used :)

Of course. So I can witness that using good practices to understand 
WordNet's schema and transforming it into ontologies has been an 
invaluable help in the completion of the Fishery Ontology Service 
project with my Lab and the UN-FAO.

>
>Anyway, we seem to agree about what needs to be done first, and we 
>don't need to disagree about what comes next, until we get there, 
>when both of us will be older and wiser.
>I would suggest a timeline that just concentrated on phase 1 would 
>be more realistic, but I'm not very fussed about that. I would like 
>to see exit criteria for phase 1, before the next phases you 
>indicate get going.
>
>e.g.
>
>- agreement from princeton or someone else to make WordNet available 
>using the schema devised by the TF

OK, I include this question in the message I am going to send to 
wordnet maintainers

>- agreement from two or more software developers of Semantic Web 
>tools to add API support for the schema

Are you offering to investigate on this?

>- some simple examples (for the application and deployments TF?) of 
>how to use the schema for some particular tasks

Suggestions welcome from all

>
>A few more detailed points below, (I have snipped those bits of the 
>message to which I am not replying, maybe because my thoughts are 
>indicated above)
>
>
>>
>>Let me put some preliminary distinctions, then I will come to the 
>>individual issues.
>>
>>1) WordNet is ambiguous in nature:
>>- it is a network of words, but also a network of senses.
>>- it contains relations between individuals (e.g. part-of), 
>>relations between concepts (e.g. antonymy), and relations between 
>>words (e.g. synonymy, POS mapping)
>>- it contains both senses of concepts (classes), and senses of individuals
>>
>>2) There exists a schema of WordNet as it is (as a relational 
>>database), but WordNet senses can be considered as a giant schema 
>>in itself to be exploited in SW applications.
>>
>
>Of course these points are important as qualifications on any 
>implicit endorsement we give to WordNet. I would also add that it 
>has a further highly significant limitation in that it is in (US) 
>English. Thus distinctions that are important in other cultures, 
>such as that between prosciutto crudo and prosciutto cotto, are not 
>adequately reflected.
>

This is part of the reason why I'd rather focus on wordnets rather 
Princeton WordNet only. But let's stay with it for a first 
case/objective.

>
>>This refers to the use of WordNet as a database. In other words, 
>>since WordNet is a database of matadata, you are suggesting to 
>>define a metaontology (in the past, these were called metamodels 
>>...) of the primitives used for creating wordnets.
>>Perfect, I agree with you (this has been a theme in many papers I 
>>have written!). I concede that I have not defined such a 
>>metaontology in OWL or RDF, and this can be the first goal for the 
>>TF.
>
>Good, good, good - we agree on the key point.
>
>jjc:
>>>+ some illustrative examples of use
>>
>>
>>That's one point: use for what? if you want just triples encoding 
>>hyperonymy, synonymy, etc., the instances in your triples will 
>>actually represent concepts, individuals, or words time to time. 
>>Annotating web pages with that mess would be a nightmare ...
>>
>I'll send a simple example after lunch ... but we should have some 
>clear use cases for any work we do, otherwise why bother?

I certainly want use cases, I only wondered about what's the use of 
annotating documents without a clear idea of what there is in 
wordnets. But it seems we agree, considering the rest of the message.

>
>>I agree, but the differences are usually related to the synset 
>>network, rather than to the parts that are easily encodable in 
>>schemata like the one above.
>
>Ahh that's interesting - so the issue in using WordNet is that 
>anyone who makes it available decides to change the synsets while 
>they are about it. That does not seem the right way to do things to 
>me. A very basic best practice for doing anything in software I 
>think, is to  choose an appropriate tool, and use it in the way it 
>was intended as given, and deviate as little as possible. So if I 
>choose to code a program in Java  I will write it differently, with 
>different conventions and different algorithms than if I choose to 
>code in Prolog or C. Similarly if you don't find the WordNet synsets 
>fit your needs I would have thought a best practice is not to use 
>WordNet, but use something else.

Wait wait ... WordNet as an ontology (the synset network) is stable, 
and versioning is Princeton's work. When anyone takes it, should 
mention which version is using, what parts are used, etc.
The researches aimed at "changing" the synset network (in the best 
cases) do not modify WordNet arbitrarily, but "remap" some synsets in 
order to make it compliant to some other resource. WordNet's 
structure is still there, but it is  "aligned" to something else 
(check the projects I have mentioned). The benefits of these 
remappings are quite well known and described, and have applications 
(cf. the FOS project I have mentioned above: without alignment, we 
couldn't use WordNet for that task).

On the other hand, there is also progress and improvement acting 
there. For example, EuroWordNet took Princeton WordNet and 
added/changed part of it for the English part of it, while still 
maintaining explicit links. Links are also kept in the versioning of 
the original WN (typically, by someone outside Princeton).

>
>>
>>>When identifying the deliverables for this (or any) Task Force, we 
>>>should also identify the target audience, and possible use cases 
>>>in which that target audience may find the deliverables useful. It 
>>>would be good to have a clear idea from the target audience what 
>>>they want, so the work is based more on pull ("this is what you 
>>>are asking for") than push ("take this because the doctor says it 
>>>will be good for you").
>>
>>
>>Pull is different for someone that just wants some tag to put on a 
>>web page, and someone else that wants a clever coverage for her 
>>domain, or even for someone that wants to make automatic 
>>translation on the Web.
>
>Pull is whatever pull there is ... we should ask on the interest 
>group lists how people have been using wordnet and what they would 
>find helpful.
>This will give us our use case. If the only pull is
>"someone that just wants some tag to put on a web page" then so be 
>it, we can do tht quickly, achieve the result and move on to 
>soemthing more useful.

Yeah

>>
>>Push is something we can do wrt to wordnet and ontology developers 
>>rather than to basic implementors. I agree on this.
>
>The SWBPD group is not going to make friends by pushing anything at anyone.
>As individuals we can make comments. When invited to comment as a 
>group we can make consensus comments. But the SWBPD is not a forum 
>for reeducating the wordnet developers. (No push please - not to 
>anyone - we should only consider producing a note as to how to build 
>a better wordnet if the people who build wordnets want it)

Maybe I am incorrect here in guessing the possible nuances of 
push/pull in English usage, so let me define my ontology here :).
By push, I mean "suggesting a best practice or resource, even though 
this is not widely required, provided that we can demonstrate its 
usefulness with concrete examples".
By pull I mean "recognizing a best practice or resource that fulfils 
widely shared requirements".
So I do not intend to tease people or loose friends! I simply assume 
that SWBPD implies either recognizing a BP or creating/suggesting 
others that can even be less widely known.


>>
>>>We do not think that the target audience for the WordNet TF is 
>>>people working on WordNet mappings, we think the target audience 
>>>is any semantic web developer who might find a particular WordNet 
>>>mapping useful.
>>
>>
>>See previous comment, btw I know of many people out there trying to 
>>make mappings or to find a minimally good ontology
>
>
>Yes that's pull. Documenting what they are after would help scope phase 2.
>

OK. I want to be as practical as possible, that's why I started 
mentioning people interested and things that can be done. I would 
like to send out a message including a description of who we are and 
what we want to get in a reasonable time, then asking directly 
(besides comments) what are the plans/needs/availability of each 
interested party to arrive to a common WN schema for the SW, and to 
contribute to the definition of guidelines for 
improving/remapping/linking wordnets for the SW.

I hope to circulate a draft message before the next telecon.

Ciao
Aldo
-- 



*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*
Aldo Gangemi
Research Scientist
Laboratory for Applied Ontology, ISTC-CNR
Institute of Cognitive Sciences and Technologies
(Laboratorio di Ontologia Applicata,
Istituto di Scienze e Tecnologie della Cognizione,
Consiglio Nazionale delle Ricerche)
Viale Marx 15, 00137
Roma Italy
+3906.86090249
+3906.824737 (fax)
mailto://a.gangemi@istc.cnr.it
mailto://gangemi@acm.org
http://www.loa-cnr.it
Received on Friday, 26 March 2004 19:41:28 UTC