RE: [WNET] RDFS for WordNet datamodel from McBride, Brian on 2004-07-09 (public-swbp-wg@w3.org from July 2004)

From: McBride, Brian <brian.mcbride@hp.com>
Date: Fri, 9 Jul 2004 14:52:43 +0100
To: Aldo Gangemi <a.gangemi@istc.cnr.it>, "McBride, Brian" <brian.mcbride@hp.com>, "'Dan Brickley'" <danbri@w3.org>
Cc: "'SWBPD list'" <public-swbp-wg@w3.org>, jjc@hplb.hpl.hp.com, schreiber@cs.vu.nl
Message-ID: <E864E95CB35C1C46B72FEA0626A2E80803984519@0-mail-br1.hpl.hp.com>
[...]
> 
> (1) Antonym
> 
> Antonym is a symmetric relation between synset senses. I.e., Wordnet 
> assumes that a set of synonyms can all have a set of other synonyms 
> as antonyms, e.g.:
> 
> hasAntonym(synset:{conspicuous,obvious}, 
> synset:{inconspicuous,invisible})

Hmm, I don't think Princton's wordnet models antonyms that way, for the
following reasons:

1) From the Wordnet documentation:

[[
 ant(synset_id,w_num,synset_id,w_num).

    The ant operator specifies antonymous word s. This is a lexical relation
that holds for all syntactic categories. For each antonymous pair, both
relations are listed (ie. each synset_id,w_num pair is both a source and
target word.) 
]]

This cleary states that the ant relation is between words (which I, perhaps
confusingly, have been calling word senses) not between sysnsets.  If it
were between synsets then the relation would be

  ant(synset_id, synset_id).

2) From the Wordnet book [1] p49:

[[
The first question caused serious problems for Wordnet, which was initially
conceived as using labeled pointers between synsets in order to express
semantic relations between lexical concepts.  But it is not appropriate to
introduce antonymy by labeled pointers between synsets, for example between
{heavy, weighty, ponderous} and {light, weightless, airy}.  People who know
English judge heavy/light to be antonyms but they pause and are puzzled when
asked whether heavy/weightless or ponderous/airy are antonyms.  The concepts
are opposed, but the word forms are not familiar antonym pairs.  Antonymy,
like synonymy, is a semantic relation between word forms.
]]

If you are not persuaded by the above, maybe we need guidance from
Christiane.


> 
> indeed, the usual intuition of speakers would accept only:
> 
> hasAntonym(conspicuous, inconspicuous)
> 
> while
> 
> hasAntonym(obvious, invisible)
> 
> seems probably less natural.

Just so.

> 
> This is one reason to keep both synset senses and word senses.

I agree.

> 
> (2) seeAlso
> 
> seeAlso is a kind of "very similar to" relation between synset 
> senses, 

Here, I disagree for the same reasons.  From the wordnet documentation

[[
 sa(synset_id,w_num,synset_id,w_num).

    The sa operator specifies that additional information about the first
word can be obtained by seeing the second word. This operator is only
defined for verbs and adjectives. There is no reflexive relation (ie. it
cannot be inferred that the additional information about the second word can
be obtained from the first word). 
]]

[...]

> >
> >I took that to mean that the antonym relation is between 
> word senses, not
> >between words.  If we make it between words, we lose 
> information represented
> >in WordNet.
> 
> It is between synset senses, not word senses, 

This is false.  See above.

>unless we make a 
> further assumption (see above). Where have you conceived of antonym 
> as a relation between words?

In the wordnet documentation and book.

 it is not in the rdfs datamodel that I 
> have proposed.

That is correct, but it was in the RDFS that I proposed.  You removed the
notion of WordSense from my proposal and I don't understand why.

> 
> >Another question is whether we need a resource node in the graph to
> >represent a word, or whether we can just use literals.  If I recall
> >correctly, my colleague inserted a resource so that he could 
> model the
> >probablility that a particular word was used in a particular 
> sense.  I think
> >he did this by creating a tertiary relation (word, 
> wordsense, p) where p is
> >the probability that a the word is used in that sense.  
> There may be a
> >simpler way to do that, e.g. just hanging a single property 
> of a wordsense
> >resource.
> 
> OK, but still you need a probability. It seems that the order of 
> senses respects such a probability estimation.

Good point!  I think we have lost that ordering in the current proposal.  Do
we need to retain it?

> We should also investigate if the most frequently used words *for a 
> synset* correspond to the order of words given in the database.
> 
> >Another issue here is language.  Is the French word "chat" 
> the same word as
> >the English word "chat".  We could still use literals to 
> represent language,
> >but, if we want to use XML literals, then we'd have to wrap 
> the literal in
> >an exlicit tag, e.g. "<word xml:lang="en">chat</word>" 
> rather than just
> >"chat".  It might be simpler just to hang a langauge 
> property off the Word
> >resource.
> >
> 
> a) If words are encoded in the English WordNet namespace, no word 
> used for a French synset in a French WordNet can be confused with the 
> first.

I am not sure what you mean by word here.  I think you probably mean what I
mean by wordsense.

Encoding information in URLs like that is a bit dodgy, for example, if I
have a merged graph containing English and French wordnets, how would you
encode a query to find the the synsets for the English word 'dog' with this
method.  Its cleaner to have this information explictly represented in the
graph structure.

> 
> b) If words are not encoded in a specific namespace, then a language 
> property must be added.
> 
> What do we choose?
> 
> If we want to adhere as much as possible to the original WordNet 
> datamodel, (a) is the winner.
> On the other hand, (b) allows reuse of the same string or literal for 
> different languages, so that we will only have the resource "chat", 
> independently of any particular language.
> 
>   In general, I think (a) is more elegant, and its cost is probably 
> less that the expected benefits.

I tend to agree.

Brian

[1] Wordnet: an electronic lexical database, Fellbaum, C (ed). 1998,
Cambridge, MIT Press.
Received on Friday, 9 July 2004 09:53:11 UTC