[WNET] Word Sense and Syn Sets from Parekh, Viral on 2004-07-21 (public-swbp-wg@w3.org from July 2004)

From: Parekh, Viral <Viral.Parekh@hp.com>
Date: Wed, 21 Jul 2004 16:47:42 +0100
To: "'public-swbp-wg@w3.org'" <public-swbp-wg@w3.org>
Message-ID: <E864E95CB35C1C46B72FEA0626A2E80803731BE6@0-mail-br1.hpl.hp.com>

Hello all,

I am working on a project which involves using Wordnet. I am using the OWL
version of Wordnet developed as a part of knOWLer [1] project. knOWLer
Wordnet is based on Wordnet 1.7.1. Their ontologies do not differentiate
between different word senses. Since we needed this, we thought of different
ways this could be done.

A possible approach to define word senses without loosing any information in
Wordnet could be:

<!-- plant as in {plant, flora, plant_life} -->
<wn:WordSense rdf:ID="&wn;plant10300>
	<wn:word rdf:resource="&wn;plant"/>
	<wn:synSet rdf:resource="&wn;100012420"/>
	<wn:senseNumber>2</wn:senseNumber>
	<wn:tagCount>207</wn:tagCount>
<wn:WordSense>

<!-- plant as in {plant, works, industrial_plant} -->
<wn:WordSense rdf:ID="&wn;plant10601>
	<wn:word rdf:resource="&wn;plant"/>
	<wn:synSet rdf:resource="&wn;103447508"/>
	<wn:senseNumber>1</wn:senseNumber>
	<wn:tagCount>328</wn:tagCount>
<wn:WordSense>

A few things to note:

1. WordSense class is defined with attribytes word, synSet, senseNumber and
tagCount. Sense keys are used to uniquely identify each Word Sense. This is
accordance with [2] and [3]. Sense keys SHOULD remain consistent with
different versions of Wordnet, thereby allowing us to uniquley identify a
Word Sense regardless of Wordnet versions.

2. By using senseNumber as an attribute of WordSense class, we can now order
the different senses of a single word as done in Wordnet. However, sense
numbers in Wordnet are defined for a particular syntactic category. For
example, the word "plant" has 4 senses as Noun and 6 senses as Verb. Hence,
the noun senses will be ordered from 1..4 (1 being the most frequent) and
verb senses will be ordered from 1..6. This means that it is not possible to
find the most frequent sense of "plant" if we do not know the syntactic
category of this particular sense of "plant". However, this could be a good
thing to have. Maybe by combining senseNumber and tagCount, we can determine
the most frequent sense of a particular word regardless of its syntactic
category.

3. As seen in the above example, knOWLer[1] uses synonym offset to identify
each synonym set in the ontology. However, these offsets vary between
different versions of Wordnet. Since sense keys are unique and consistent, a
possible way to uniquley identify each synonym set regardless of the
versions can be by somehow combining the sense keys of the words present in
that synonym set. This could be a bit tricky.


We welcome feedback on this.

Thank you,

Viral Parekh
HP Labs, Bristol


[1] knOWLer http://taurus.unine.ch/knowler/
[2] SENSEIDX(5WN) manual page
http://wordnet.princeton.edu/~wn/man/senseidx.5WN.html
[3] http://lists.w3.org/Archives/Public/public-swbp-wg/2004Jul/0060.html

Received on Wednesday, 21 July 2004 11:49:36 UTC