Re: WordNet modelling in Lemon and SKOS

G'day,

just a couple of small points about recent wordnet advances.

The first is that there are now many more wordnets than just the Princeton
WordNet of English (http://www.casta-net.jp/~kuribayashi/multi/), so things
like sense and word must, of course, be labeled with the languages (I
suspect this was just omitted for space).

The second is that, there may be different versions of the same wordnet, so
it we need to label the wordnet.   The convention in wordnet-LMF is to use
identifiers of the form LLL-VV-OOOOOOOOO-P where LLL is the language, VV is
the version, OOOOOOOO is the offset and P is the part of speech.  So:
instead of syn_n_08225481 people use: eng-30-08225481-n.  If we could adopt
the same convention it would make interoperability a little bit easier.

http://kyoto-project.eu/xmlgroup.iit.cnr.it/kyoto/index6bfa.html?option=com_content&view=article&id=143&Itemid=129

Debate still rages over whether synsets can be/should be shared between
languages or not.  I think that they can if we are careful, especially at
the level of granualarity we use in practice, but it is still an open
question.  If we think that they are not, then a single lexical concept may
be a supertype of  multiple synsets from different languages: the synset
with 'dog' in English, the one with '犬' in Japanese, the one with 'anjing'
in Malaysian and so on.

The final point is that the current English wordnets (and most recent
wordnets) distinguishes between hyponym and instance:
<<fictional character>> is a hyponym of <<imaginary being>>
<<Sherlock Holmes>> is an instance of <<fictional character>>

I suspect we should try to capture this distinction.

Orthogonally, I must admit to not being clear about the actual implications
of choosing the different names/models proposed in the discussion, so find
it hard to judge which is better  --- if someone could try to summarize
this it would be really helpful to me, and maybe to others.
Yours,


-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University

Received on Friday, 19 April 2013 08:00:14 UTC