RE: [WNET] RDFS for WordNet datamodel

Hi Brian, here is an answer (long overdue,  excuse me). Consider that 
some of the questions you pose have been treated in yesterday's 
telecon. Jeremy and Guus could use this message as another source of 
information/guidelines for their actions.

At 16:23 +0100 11-06-2004, McBride, Brian wrote:
>Hi Aldo,
>
>>  -----Original Message-----
>>  From: Aldo Gangemi [mailto:gangemi@loa-cnr.it]
>>  Sent: Thursday, June 10, 2004 8:56 AM
>>  To: McBride, Brian; 'Aldo Gangemi'; 'Dan Brickley'
>>  Cc: 'SWBPD list'
>>  Subject: [WNET] RDFS for WordNet datamodel
>>
>>  Hi wnetters,
>>
>>  please have a look at a revised version of the RDFS for Wordnet
>>  datamodel written by Brian last Saturday:
>>
>  > http://lists.w3.org/Archives/Public/www-archive/2004Jun/0019.html
>

<snip>

>  > I have added some missing properties, corrected various typos and
>>  syntax, and commented a bit the decision to include "word senses" in
>>  the domain of WNET RDFS.
>>
>>  In fact, in principle we don't need a such thing like "word senses",
>>  because we already have words and synsets (the senses for sets of
>>  words). But being able to annotate documents with resources linked
>>  both to exactly one sense and to exactly one word seems an advantage.
>
>Ok, I'm new to WordNet so may have misunderstood.
>
>Can you explain how relations like antonym and seeAlso work - should they be
>between words?

(1) Antonym

Antonym is a symmetric relation between synset senses. I.e., Wordnet 
assumes that a set of synonyms can all have a set of other synonyms 
as antonyms, e.g.:

hasAntonym(synset:{conspicuous,obvious}, synset:{inconspicuous,invisible})

indeed, the usual intuition of speakers would accept only:

hasAntonym(conspicuous, inconspicuous)

while

hasAntonym(obvious, invisible)

seems probably less natural.

This is one reason to keep both synset senses and word senses.

(2) seeAlso

seeAlso is a kind of "very similar to" relation between synset 
senses, and it is used symetrically (with some exceptions, probably 
due to the informal construction of WordNet, e.g.:

seeAlso(bad, disobedient)
seeAlso(disobedient, bad)

>I'll state my understanding of the terminology for folks to correct.
>
>Word - a symbol, usually a sequence of characters, e.g. "dog".
>
>WordSense - a word used in a particular sense, e.g. (dog meaning follow).
>Words may have multiple senses.
>
>SynSet - a collection of WordSenses with similar meaning.
>
>From WordNet
>
>[[
>  ant(synset_id,w_num,synset_id,w_num).
>
>     The ant operator specifies antonymous word s. This is a lexical relation
>that holds for all syntactic categories. For each antonymous pair, both
>relations are listed (ie. each synset_id,w_num pair is both a source and
>target word.)
>]]
>
>I took that to mean that the antonym relation is between word senses, not
>between words.  If we make it between words, we lose information represented
>in WordNet.

It is between synset senses, not word senses, unless we make a 
further assumption (see above). Where have you conceived of antonym 
as a relation between words? it is not in the rdfs datamodel that I 
have proposed.

>Another question is whether we need a resource node in the graph to
>represent a word, or whether we can just use literals.  If I recall
>correctly, my colleague inserted a resource so that he could model the
>probablility that a particular word was used in a particular sense.  I think
>he did this by creating a tertiary relation (word, wordsense, p) where p is
>the probability that a the word is used in that sense.  There may be a
>simpler way to do that, e.g. just hanging a single property of a wordsense
>resource.

OK, but still you need a probability. It seems that the order of 
senses respects such a probability estimation.
We should also investigate if the most frequently used words *for a 
synset* correspond to the order of words given in the database.

>Another issue here is language.  Is the French word "chat" the same word as
>the English word "chat".  We could still use literals to represent language,
>but, if we want to use XML literals, then we'd have to wrap the literal in
>an exlicit tag, e.g. "<word xml:lang="en">chat</word>" rather than just
>"chat".  It might be simpler just to hang a langauge property off the Word
>resource.
>

a) If words are encoded in the English WordNet namespace, no word 
used for a French synset in a French WordNet can be confused with the 
first.

b) If words are not encoded in a specific namespace, then a language 
property must be added.

What do we choose?

If we want to adhere as much as possible to the original WordNet 
datamodel, (a) is the winner.
On the other hand, (b) allows reuse of the same string or literal for 
different languages, so that we will only have the resource "chat", 
independently of any particular language.

  In general, I think (a) is more elegant, and its cost is probably 
less that the expected benefits.

Best
Aldo

-- 
Aldo Gangemi
Research Scientist
Laboratory for Applied Ontology
Institute for Cognitive Sciences and Technology
National Research Council (ISTC-CNR)
Via Nomentana 56, 00161, Roma, Italy
Tel: +390644161535
Fax: +3906824737
a.gangemi@istc.cnr.it

Received on Friday, 9 July 2004 07:22:11 UTC