W3C home > Mailing lists > Public > public-ontolex@w3.org > April 2013

Re: WordNet modelling in Lemon and SKOS

From: John McCrae <jmccrae@cit-ec.uni-bielefeld.de>
Date: Fri, 19 Apr 2013 11:00:20 +0200
Message-ID: <CAC5njqpYzXpc8TS6HqrFsEwBxMmfpwSxwCyCwAjDv4q+5Xk3hA@mail.gmail.com>
To: Francis Bond <fcbond@gmail.com>
Cc: Armando Stellato <stellato@info.uniroma2.it>, Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>, public-ontolex <public-ontolex@w3.org>
Hi Francis,

Thanks for your interest.

Firstly, in the original example language is marked on the entry, it could
also be attached to be the sense

http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Linked_Data#Example:_WordNet_as_lemon-SKOS

Secondly, from the point of view of OntoLex we use URIs as identifiers,
which are essentially physical objects (referring to a file on some server)
so we cannot *mandate* the use of a particular naming scheme. However, we
can *recommend* the use of a particular scheme and we will certainly take
the Kyoto scheme into account.

As for multilinguality of synsets, it is a good question, I think you
explain it well: as I see it lexical entries and senses are
language-specific, ontology entities are not and synsets we will stay
ambivalent about.

Finally, I am aware that there is a distinction between hyponyms and
instance hyponyms in WordNet and I already use different properties to
represent these.

Regards,
John




On Fri, Apr 19, 2013 at 3:44 AM, Francis Bond <fcbond@gmail.com> wrote:

> G'day,
>
> just a couple of small points about recent wordnet advances.
>
> The first is that there are now many more wordnets than just the Princeton
> WordNet of English (http://www.casta-net.jp/~kuribayashi/multi/), so
> things like sense and word must, of course, be labeled with the languages
> (I suspect this was just omitted for space).
>
> The second is that, there may be different versions of the same wordnet,
> so it we need to label the wordnet.   The convention in wordnet-LMF is to
> use identifiers of the form LLL-VV-OOOOOOOOO-P where LLL is the language,
> VV is the version, OOOOOOOO is the offset and P is the part of speech.  So:
> instead of syn_n_08225481 people use: eng-30-08225481-n.  If we could
> adopt the same convention it would make interoperability a little bit
> easier.
>
>
> http://kyoto-project.eu/xmlgroup.iit.cnr.it/kyoto/index6bfa.html?option=com_content&view=article&id=143&Itemid=129
>
> Debate still rages over whether synsets can be/should be shared between
> languages or not.  I think that they can if we are careful, especially at
> the level of granualarity we use in practice, but it is still an open
> question.  If we think that they are not, then a single lexical concept may
> be a supertype of  multiple synsets from different languages: the synset
> with 'dog' in English, the one with '犬' in Japanese, the one with 'anjing'
> in Malaysian and so on.
>
> The final point is that the current English wordnets (and most recent
> wordnets) distinguishes between hyponym and instance:
> <<fictional character>> is a hyponym of <<imaginary being>>
> <<Sherlock Holmes>> is an instance of <<fictional character>>
>
> I suspect we should try to capture this distinction.
>
> Orthogonally, I must admit to not being clear about the actual
> implications of choosing the different names/models proposed in the
> discussion, so find it hard to judge which is better  --- if someone could
> try to summarize this it would be really helpful to me, and maybe to others.
> Yours,
>
>
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University
>
Received on Friday, 19 April 2013 09:00:57 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:36:30 UTC