Re: WordNet Task Force - work outline from Jeremy Carroll on 2004-03-26 (public-swbp-wg@w3.org from March 2004)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Fri, 26 Mar 2004 12:37:09 +0000
To: Aldo Gangemi <a.gangemi@istc.cnr.it>
Cc: public-swbp-wg@w3.org
Message-ID: <406423F5.6070809@hplb.hpl.hp.com>
Thanks Aldo for your reply, thanks particularly for pointing me at this:


 > 1) Datatype remapping of WordNet, in order to arrive at a preliminary
 > namespace  (what are the classes, the individuals, the properties, and
 > how many levels are currently collapsed in wordnets?).

which does seem to address many (possibly all) of my concerns.

If the WordNet TF has a first phase that completes this task, and succeeds 
in making it useful (more useful than it being done outside this WG), by:

- successfully persuading say Princeton to use it to publish an OWL version 
of WordNet
- successfully getting consensus from the other WordNet mappers that this 
is a good point to start, so that they use the same schema and terminology 
at this meta-level
- successfully persuade tool developers that this schema is worth having 
some built-in support for

then I would be very happy. For my part, I would do my best to persuade my 
colleagues that having some WordNet support in Jena, following the TF's 
suggestions, would be valuable.

I think that first phase looks quite hard, you seem to see it as easy. 
Technically you may well be right that it is addressed by:

 > Namespace(rdf   = <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
 > Namespace(xsd   = <http://www.w3.org/2001/XMLSchema#>)
 > Namespace(rdfs  = <http://www.w3.org/2000/01/rdf-schema#>)
 > Namespace(owl   = <http://www.w3.org/2002/07/owl#>)
 > Namespace(a     = <http://www.w3.org/2004/03/WordNetSchema#>)
 >
 > Ontology( <http://212.34.219.175/WN_Schema.owl.rdf>
 >
 >  ObjectProperty(a:Hyperonym)
 >  ObjectProperty(a:Hyponym
 >   inverseOf(a:Hyperonym)
 >   domain(a:Synset)
 >   range(a:Synset))
 >  ObjectProperty(a:Sense
 >   domain(a:Word)
 >   range(a:Synset))
 >  ObjectProperty(a:Synonym Symmetric
 >   domain(a:Word)
 >   range(a:Word))
 >
 >  Class(a:Synset partial
 >   restriction(a:Hyperonym someValuesFrom (a:Synset)))
 >  Class(a:Word partial
 >   restriction(a:Sense someValuesFrom (a:Synset)))
 > )
 >

however, the technical part is only the beginning of the task.
(I haven't reviewed this suggested ontology)

I did note two related confusions between our postings:
- WordNet or wordnets?
   I see the main reason for having a wordnet task force is that there is 
already a deployed and widely used resource that is appropriate and useful 
for the Semantic Web; WordNet, in English, from princeton. This has 
specific synsets that may or may not be "best" depending on your 
application or linguistic needs; but it is *deployed* and used.
  You seem to see the purpose of the task force to be about encouraging 
better wordnets, and alternative synsets etc. that might be more useful as 
an upper ontology (I am not an ontologist, sorry if I am misusing the term)
- methodology or triples?
  You seem to see the key deliverables of the TF as methodological advice. 
(How to build better wordnets) I see the key deliverables of the TF as 
triples (here is some deployed knowledge that people have found useful, 
here is that knowledge expressed in RDF/OWL)


Given that princeton are a key partner in the success of the first phase, I 
would be nervous about us being too keen to denigrate what they have done.
Of course, any work on the scale of wordnet is not perfect, and will not be 
ideally suited for any specific purpose, since it has to be a trade off 
between different needs. But it exists. It is useful. An 'ontological' 
argument about best practices is that a necessary part of being a best 
practice is being deployed and used :)


Anyway, we seem to agree about what needs to be done first, and we don't 
need to disagree about what comes next, until we get there, when both of us 
will be older and wiser.
I would suggest a timeline that just concentrated on phase 1 would be more 
realistic, but I'm not very fussed about that. I would like to see exit 
criteria for phase 1, before the next phases you indicate get going.

e.g.

- agreement from princeton or someone else to make WordNet available using 
the schema devised by the TF
- agreement from two or more software developers of Semantic Web tools to 
add API support for the schema
- some simple examples (for the application and deployments TF?) of how to 
use the schema for some particular tasks


A few more detailed points below, (I have snipped those bits of the message 
to which I am not replying, maybe because my thoughts are indicated above)

Aldo Gangemi wrote:
> Granted that I agree on the need to use WordNet in a common format as 
> soon as possible for SW applications (it's my first step in the 
> guideline list), I'd rather discuss what are we committing to. Jeremy, 
> do you think the TF should  build the standard WordNet for the SW? Or a 
> "minor" or "preliminary" standard? I understand we have to provide best 
> practices to use or build resources, not to create them directly.
> 
Princeton have already built WordNet - it is not a standard nor is it 
perfect, but it has that key characteristic of perfection: it exists.
I see our task here as about encouraging deployment of this resource in the 
SW. I don't think this TF is about a '"minor" or "preliminary" standard' at 
all, simply about helping SW users who wish to use primarily the Princeton 
WordNet (but potentially other WordNets) with their SW data.



> On the other hand, if someone suggests that WordNet (*as is*) is a best 
> practice for annotating the Web, this is patently misconceived, 
I think that *currently* it could be argued that it is best practice, best 
does not been best in some possible world but in the world as it actually 
is. To demonstrate that it is 'patently misconceived' you have to describe 
in detail something I could use now to annotate the Web that would be 
'patently' better practice.


> 
> Let me put some preliminary distinctions, then I will come to the 
> individual issues.
> 
> 1) WordNet is ambiguous in nature:
> - it is a network of words, but also a network of senses.
> - it contains relations between individuals (e.g. part-of), relations 
> between concepts (e.g. antonymy), and relations between words (e.g. 
> synonymy, POS mapping)
> - it contains both senses of concepts (classes), and senses of individuals
> 
> 2) There exists a schema of WordNet as it is (as a relational database), 
> but WordNet senses can be considered as a giant schema in itself to be 
> exploited in SW applications.
> 

Of course these points are important as qualifications on any implicit 
endorsement we give to WordNet. I would also add that it has a further 
highly significant limitation in that it is in (US) English. Thus 
distinctions that are important in other cultures, such as that between 
prosciutto crudo and prosciutto cotto, are not adequately reflected.



> This refers to the use of WordNet as a database. In other words, since 
> WordNet is a database of matadata, you are suggesting to define a 
> metaontology (in the past, these were called metamodels ...) of the 
> primitives used for creating wordnets.
> Perfect, I agree with you (this has been a theme in many papers I have 
> written!). I concede that I have not defined such a metaontology in OWL 
> or RDF, and this can be the first goal for the TF.

Good, good, good - we agree on the key point.



> There exists also some reusable work made by ISLE (International 
> Standards for Language Engineering) that you probably know.
> 

No, sorry ...

jjc:
>> + some illustrative examples of use
> 
> 
> That's one point: use for what? if you want just triples encoding 
> hyperonymy, synonymy, etc., the instances in your triples will actually 
> represent concepts, individuals, or words time to time. Annotating web 
> pages with that mess would be a nightmare ...
> 
I'll send a simple example after lunch ... but we should have some clear 
use cases for any work we do, otherwise why bother?


> I agree, but the differences are usually related to the synset network, 
> rather than to the parts that are easily encodable in schemata like the 
> one above.

Ahh that's interesting - so the issue in using WordNet is that anyone who 
makes it available decides to change the synsets while they are about it. 
That does not seem the right way to do things to me. A very basic best 
practice for doing anything in software I think, is to  choose an 
appropriate tool, and use it in the way it was intended as given, and 
deviate as little as possible. So if I choose to code a program in Java  I 
will write it differently, with different conventions and different 
algorithms than if I choose to code in Prolog or C. Similarly if you don't 
find the WordNet synsets fit your needs I would have thought a best 
practice is not to use WordNet, but use something else.


> 
>> When identifying the deliverables for this (or any) Task Force, we 
>> should also identify the target audience, and possible use cases in 
>> which that target audience may find the deliverables useful. It would 
>> be good to have a clear idea from the target audience what they want, 
>> so the work is based more on pull ("this is what you are asking for") 
>> than push ("take this because the doctor says it will be good for you").
> 
> 
> Pull is different for someone that just wants some tag to put on a web 
> page, and someone else that wants a clever coverage for her domain, or 
> even for someone that wants to make automatic translation on the Web.

Pull is whatever pull there is ... we should ask on the interest group 
lists how people have been using wordnet and what they would find helpful.
This will give us our use case. If the only pull is
"someone that just wants some tag to put on a web page" then so be it, we 
can do tht quickly, achieve the result and move on to soemthing more useful.

> 
> Push is something we can do wrt to wordnet and ontology developers 
> rather than to basic implementors. I agree on this.

The SWBPD group is not going to make friends by pushing anything at anyone.
As individuals we can make comments. When invited to comment as a group we 
can make consensus comments. But the SWBPD is not a forum for reeducating 
the wordnet developers. (No push please - not to anyone - we should only 
consider producing a note as to how to build a better wordnet if the people 
who build wordnets want it)

> 
>> We do not think that the target audience for the WordNet TF is people 
>> working on WordNet mappings, we think the target audience is any 
>> semantic web developer who might find a particular WordNet mapping useful.
> 
> 
> See previous comment, btw I know of many people out there trying to make 
> mappings or to find a minimally good ontology


Yes that's pull. Documenting what they are after would help scope phase 2.


> 
>> For example, anyone creating an OWL or RDFS class might wish to 
>> annotate it with its intended meaning using *the* URI for a specific 
>> sense of an English word, as classified by wordnet. The main 
>> requirement from this use case is agreement over what that URI is, 
>> including the beginning bit (the namespace) and the end bit (the 
>> mapping from Wordnet's representation of senses)
> 
> 
> What do you mean by "annotating an OWL or RDFS class"? Wordnet can be 
> used either as a source of reusable classes (sense network reengineered 
> as an ontology), or as a source of lexicalizations for classes, 
> individuals, or properties. Maybe you mean the second use ...

Example later (after lunch).


> Therefore, let's define such a metaontology asap (see first point of my 
> guideline list),

because that will be a valuable resource for Semantic Web deployment.

> so that we can provide best practices for more 
> substantial things :)
> 
> Ciao
> Aldo
> 
A dopo

Jeremy
Received on Friday, 26 March 2004 08:40:58 UTC