- From: Francis Bond <fcbond@gmail.com>
- Date: Fri, 19 Apr 2013 09:44:54 +0800
- To: Armando Stellato <stellato@info.uniroma2.it>
- Cc: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>, public-ontolex@w3.org
- Message-ID: <CA+arSXi8FruYsUOc=vfKr_j-SJiqpwWDD62SMQ1D_Y8_H5at1w@mail.gmail.com>
G'day, just a couple of small points about recent wordnet advances. The first is that there are now many more wordnets than just the Princeton WordNet of English (http://www.casta-net.jp/~kuribayashi/multi/), so things like sense and word must, of course, be labeled with the languages (I suspect this was just omitted for space). The second is that, there may be different versions of the same wordnet, so it we need to label the wordnet. The convention in wordnet-LMF is to use identifiers of the form LLL-VV-OOOOOOOOO-P where LLL is the language, VV is the version, OOOOOOOO is the offset and P is the part of speech. So: instead of syn_n_08225481 people use: eng-30-08225481-n. If we could adopt the same convention it would make interoperability a little bit easier. http://kyoto-project.eu/xmlgroup.iit.cnr.it/kyoto/index6bfa.html?option=com_content&view=article&id=143&Itemid=129 Debate still rages over whether synsets can be/should be shared between languages or not. I think that they can if we are careful, especially at the level of granualarity we use in practice, but it is still an open question. If we think that they are not, then a single lexical concept may be a supertype of multiple synsets from different languages: the synset with 'dog' in English, the one with '犬' in Japanese, the one with 'anjing' in Malaysian and so on. The final point is that the current English wordnets (and most recent wordnets) distinguishes between hyponym and instance: <<fictional character>> is a hyponym of <<imaginary being>> <<Sherlock Holmes>> is an instance of <<fictional character>> I suspect we should try to capture this distinction. Orthogonally, I must admit to not being clear about the actual implications of choosing the different names/models proposed in the discussion, so find it hard to judge which is better --- if someone could try to summarize this it would be really helpful to me, and maybe to others. Yours, -- Francis Bond <http://www3.ntu.edu.sg/home/fcbond/> Division of Linguistics and Multilingual Studies Nanyang Technological University
Received on Friday, 19 April 2013 08:00:14 UTC