- From: Parekh, Viral <Viral.Parekh@hp.com>
- Date: Wed, 21 Jul 2004 16:47:42 +0100
- To: "'public-swbp-wg@w3.org'" <public-swbp-wg@w3.org>
Hello all, I am working on a project which involves using Wordnet. I am using the OWL version of Wordnet developed as a part of knOWLer [1] project. knOWLer Wordnet is based on Wordnet 1.7.1. Their ontologies do not differentiate between different word senses. Since we needed this, we thought of different ways this could be done. A possible approach to define word senses without loosing any information in Wordnet could be: <!-- plant as in {plant, flora, plant_life} --> <wn:WordSense rdf:ID="&wn;plant10300> <wn:word rdf:resource="&wn;plant"/> <wn:synSet rdf:resource="&wn;100012420"/> <wn:senseNumber>2</wn:senseNumber> <wn:tagCount>207</wn:tagCount> <wn:WordSense> <!-- plant as in {plant, works, industrial_plant} --> <wn:WordSense rdf:ID="&wn;plant10601> <wn:word rdf:resource="&wn;plant"/> <wn:synSet rdf:resource="&wn;103447508"/> <wn:senseNumber>1</wn:senseNumber> <wn:tagCount>328</wn:tagCount> <wn:WordSense> A few things to note: 1. WordSense class is defined with attribytes word, synSet, senseNumber and tagCount. Sense keys are used to uniquely identify each Word Sense. This is accordance with [2] and [3]. Sense keys SHOULD remain consistent with different versions of Wordnet, thereby allowing us to uniquley identify a Word Sense regardless of Wordnet versions. 2. By using senseNumber as an attribute of WordSense class, we can now order the different senses of a single word as done in Wordnet. However, sense numbers in Wordnet are defined for a particular syntactic category. For example, the word "plant" has 4 senses as Noun and 6 senses as Verb. Hence, the noun senses will be ordered from 1..4 (1 being the most frequent) and verb senses will be ordered from 1..6. This means that it is not possible to find the most frequent sense of "plant" if we do not know the syntactic category of this particular sense of "plant". However, this could be a good thing to have. Maybe by combining senseNumber and tagCount, we can determine the most frequent sense of a particular word regardless of its syntactic category. 3. As seen in the above example, knOWLer[1] uses synonym offset to identify each synonym set in the ontology. However, these offsets vary between different versions of Wordnet. Since sense keys are unique and consistent, a possible way to uniquley identify each synonym set regardless of the versions can be by somehow combining the sense keys of the words present in that synonym set. This could be a bit tricky. We welcome feedback on this. Thank you, Viral Parekh HP Labs, Bristol [1] knOWLer http://taurus.unine.ch/knowler/ [2] SENSEIDX(5WN) manual page http://wordnet.princeton.edu/~wn/man/senseidx.5WN.html [3] http://lists.w3.org/Archives/Public/public-swbp-wg/2004Jul/0060.html
Received on Wednesday, 21 July 2004 11:49:36 UTC