Re: WordNet Task Force - work outline from Aldo Gangemi on 2004-03-25 (public-swbp-wg@w3.org from March 2004)

From: Aldo Gangemi <a.gangemi@istc.cnr.it>
Date: Thu, 25 Mar 2004 21:09:36 +0100
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Cc: public-swbp-wg@w3.org
Message-Id: <p06002056bc88d36a4df2@[150.146.65.233]>
Thanks for the detailed input, Jeremy.

I expected that in this WG all the basic misunderstanding in the SW 
would emerge, and in fact this has happened immediately, as in the 
Bernard's thread (topics vs. concepts, ontology vs. lexicon), and in 
the subClass-metaClass dispute. Here Jeremy touches other points, 
some related to the previous threads, some not.

Granted that I agree on the need to use WordNet in a common format as 
soon as possible for SW applications (it's my first step in the 
guideline list), I'd rather discuss what are we committing to. 
Jeremy, do you think the TF should  build the standard WordNet for 
the SW? Or a "minor" or "preliminary" standard? I understand we have 
to provide best practices to use or build resources, not to create 
them directly.

On the other hand, if someone suggests that WordNet (*as is*) is a 
best practice for annotating the Web, this is patently misconceived, 
but even accepting that (Jim, don't start scribbling, please :)), 
some decision must be taken on what the data contained in WordNet 
are, how can they be represented in OWL ontologies or plain RDF 
models, etc. And here comes room for the guidelines I have listed.

Guidelines are not all-or-none, but just a roadmap for an ideal, 
powerful WordNet-oriented ontology library. Some steps can be carried 
out now and what you are claiming for now is one of them. Indeed, I 
see no conflict between what you write and the notes I have sent out.

Let me put some preliminary distinctions, then I will come to the 
individual issues.

1) WordNet is ambiguous in nature:
- it is a network of words, but also a network of senses.
- it contains relations between individuals (e.g. part-of), relations 
between concepts (e.g. antonymy), and relations between words (e.g. 
synonymy, POS mapping)
- it contains both senses of concepts (classes), and senses of individuals

2) There exists a schema of WordNet as it is (as a relational 
database), but WordNet senses can be considered as a giant schema in 
itself to be exploited in SW applications.


At 18:07 +0000 25-03-2004, Jeremy Carroll wrote:
>Some of the HP team were talking about the WordNet TF over lunch.
>We found the survey of current work that Aldo produced a useful 
>resource, but feel somewhat uncomfortable with it as scoping the TF.

Surveys are hardly a scope, but just the review of what has been done 
and of people that have competence on that.

>In a sense, the survey showed what we think is the problem: too many 
>approaches, all of which have merits; rather than one mapping of 
>WordNet to RDF or OWL that is good enough for most users.

Most approaches are not alternatives, but are complementary, and are 
mostly independent from WordNet reengineering (I dare to say mapping) 
to OWL or RDF (which are also different tasks, as I explain here).
OTOH, if you want now one reengineered OWL or RDF WordNet, just buy 
one of the proposals. In fact, as a first result, we can look at what 
has been done, check it against the set of guidelines the we decide 
to adopt, and then to expose the results, or even the resource, if it 
is compliant to the point we are eager to accept.

>We felt that what would be most useful in the short term is to have 
>a standard representation of WordNet in RDF that people can use.  We 
>think this is what the community of implemetors needs and will most 
>aid deployment in the short term.
>
>Some specific deliverables that we think would be very useful and 
>potentially could be achieved quickly are:
>
>+ an RDF schema or OWL ontology with which to talk about the main 
>wordnet relationships and concepts (e.g. words, senses, hyponyms, 
>synonyms etc.)

This refers to the use of WordNet as a database. In other words, 
since WordNet is a database of matadata, you are suggesting to define 
a metaontology (in the past, these were called metamodels ...) of the 
primitives used for creating wordnets.
Perfect, I agree with you (this has been a theme in many papers I 
have written!). I concede that I have not defined such a metaontology 
in OWL or RDF, and this can be the first goal for the TF.
There exists also some reusable work made by ISLE (International 
Standards for Language Engineering) that you probably know.

>+ a namespace URI for this schema
>+ a version of WordNet converted into triples, using this schema and namespace

Rightaway, once we agreed on the schema

>+ some illustrative examples of use

That's one point: use for what? if you want just triples encoding 
hyperonymy, synonymy, etc., the instances in your triples will 
actually represent concepts, individuals, or words time to time. 
Annotating web pages with that mess would be a nightmare ...

>I believe that at least some of the approaches listed already 
>provide at least some of these. Hopefully an understanding of these, 
>and the expertise of the TF, will allow a best of breed proposal.

Taken (see above)

>
>It would not matter if a first version only covered some core 
>concepts (maybe the four above), and a later version added more 
>sophistication.

To be precise, "word" and "sense" ("synset") should be encoded as 
classes, while "hyponym" as a property ranging on senses, and 
"synonym" as a property ranging on words. BTW, that's what you are 
asking for (the address is one I used to generate the abstract syntax 
...):

Namespace(rdf   = <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
Namespace(xsd   = <http://www.w3.org/2001/XMLSchema#>)
Namespace(rdfs  = <http://www.w3.org/2000/01/rdf-schema#>)
Namespace(owl   = <http://www.w3.org/2002/07/owl#>)
Namespace(a     = <http://www.w3.org/2004/03/WordNetSchema#>)

Ontology( <http://212.34.219.175/WN_Schema.owl.rdf>

  ObjectProperty(a:Hyperonym)
  ObjectProperty(a:Hyponym
   inverseOf(a:Hyperonym)
   domain(a:Synset)
   range(a:Synset))
  ObjectProperty(a:Sense
   domain(a:Word)
   range(a:Synset))
  ObjectProperty(a:Synonym Symmetric
   domain(a:Word)
   range(a:Word))

  Class(a:Synset partial
   restriction(a:Hyperonym someValuesFrom (a:Synset)))
  Class(a:Word partial
   restriction(a:Sense someValuesFrom (a:Synset)))
)


>The key problem facing a naive semantic web user, or a group 
>producing tools for semantic web developers is making the choice - 
>of which of the mappings to use, and hence which schema and which 
>namespace URI.

Hence, we'll indicate it.

>An approach that would emphasize consensus and avoid blessing any 
>one solution would be to provide a namespace URI and the elements 
>shared across all prior solutions, and a forum in which the 
>different wordnet mappers could agree amongst themselves how to 
>resolve differences (or enabling a clear articulation of the 
>differences, with their pros and cons)

I agree, but the differences are usually related to the synset 
network, rather than to the parts that are easily encodable in 
schemata like the one above.

>When identifying the deliverables for this (or any) Task Force, we 
>should also identify the target audience, and possible use cases in 
>which that target audience may find the deliverables useful. It 
>would be good to have a clear idea from the target audience what 
>they want, so the work is based more on pull ("this is what you are 
>asking for") than push ("take this because the doctor says it will 
>be good for you").

Pull is different for someone that just wants some tag to put on a 
web page, and someone else that wants a clever coverage for her 
domain, or even for someone that wants to make automatic translation 
on the Web.

Push is something we can do wrt to wordnet and ontology developers 
rather than to basic implementors. I agree on this.

>We do not think that the target audience for the WordNet TF is 
>people working on WordNet mappings, we think the target audience is 
>any semantic web developer who might find a particular WordNet 
>mapping useful.

See previous comment, btw I know of many people out there trying to 
make mappings or to find a minimally good ontology

>For example, anyone creating an OWL or RDFS class might wish to 
>annotate it with its intended meaning using *the* URI for a specific 
>sense of an English word, as classified by wordnet. The main 
>requirement from this use case is agreement over what that URI is, 
>including the beginning bit (the namespace) and the end bit (the 
>mapping from Wordnet's representation of senses)

What do you mean by "annotating an OWL or RDFS class"? Wordnet can be 
used either as a source of reusable classes (sense network 
reengineered as an ontology), or as a source of lexicalizations for 
classes, individuals, or properties. Maybe you mean the second use ...

>
>Such basic agreement on the fundamentals will also help people doing 
>more advanced work on Wordnet mappings, by giving them a baseline 
>from which to start, and a shared vocabulary on which to build. When 
>any research they do is complete, the Wordnet TF (possibly 
>reincarnated), or whoever is maintaining the ontology recommended by 
>the Wordnet TF, could then consider how to integrate such completed 
>research into the best practice.
>

Therefore, let's define such a metaontology asap (see first point of 
my guideline list), so that we can provide best practices for more 
substantial things :)

Ciao
Aldo
-- 



*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*;*
Aldo Gangemi
Research Scientist
Laboratory for Applied Ontology, ISTC-CNR
Institute of Cognitive Sciences and Technologies
(Laboratorio di Ontologia Applicata,
Istituto di Scienze e Tecnologie della Cognizione,
Consiglio Nazionale delle Ricerche)
Viale Marx 15, 00137
Roma Italy
+3906.86090249
+3906.824737 (fax)
mailto://a.gangemi@istc.cnr.it
mailto://gangemi@acm.org
http://www.loa-cnr.it
Received on Thursday, 25 March 2004 15:09:29 UTC