Re: WordNet modelling in Lemon and SKOS from Guido Vetere on 2013-04-16 (public-ontolex@w3.org from April 2013)

From: Guido Vetere <gvetere@it.ibm.com>
Date: Tue, 16 Apr 2013 10:33:40 +0200
To: public-ontolex@w3.org
Message-ID: <OF05088843.ADD8E6F3-ONC1257B4F.002E0BC3-C1257B4F.002F0D51@it.ibm.com>
Fully agree on using a namespace which reflects the W3C CG, e.g. 
'ontolex'.

This will also give us the possibility to adopt terminology\modelling 
options without affecting other proposals. 

Previous works could be credited in the CG annexes.

Regards,

Guido Vetere
Manager, Center for Advanced Studies IBM Italia
_________________________________________________
Rome                                     Trento
Via Sciangai 53                       Via Sommarive 18
00144 Roma, Italy                   38123 Povo in Trento
+39 (0)6 59662137 

Mobile: +39 3357454658
_________________________________________________



Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de> 
16/04/2013 09:54

To
public-ontolex@w3.org
cc

Subject
Re: WordNet modelling in Lemon and SKOS






Armando, all,

 on issue 1) usage of lemon or ontolex as prefix.

The thing is that many people active in this group have been involved in 
the development of lemon before, so to many of us it comes quite natural 
to think in lemon terms. 

I think that the lemon acronym is quite nice, so I would have a preference 
that the final vocabulary has this prefix. However,
this vocabulary will hopefully at the end be hosted by the W3C in some way 
or the other, so that the final namespace could be sth. like: 
w3c.org/lemon for instance.

Many of us have put a lot of work and thoughts into the lemon model and 
thus we think that this model can provide a good basis for the work in 
this group. Nevertheless, it should be clear that whatever this group 
comes up with will be a new model decided by all of us.

Having said this, let me propose that we all work with the prefix 
"ontolex" for now. The reasons for this are as follows: first of all, the 
group is called ontolex, so it makes sense to use this as prefix. Second, 
I think it is important that while we hope to build on lemon as much as 
possible, the goal is to design a new vocabulary that will have a 
different namespace anyway. To make this clear, I think using the prefix 
"ontolex" is an important sign.

I hope as many of you agree on this.

Philipp.

Am 15.04.13 19:41, schrieb Armando Stellato:
Hi all,
First of all, thanks John for providing the example: through concrete 
examples it is easier to discuss!
 
A few comments (the same ?disclaimer? from Elena holds for me: hope I 
didn?t miss anything from other discussions, and in case, sorry in 
advance).
 
1)      First of all (sorry a bit out of topic), I would ask for a 
clarification, so that I can apply the policy to my examples too: I see 
the ?lemon:? prefix being used in many examples, and Lemon is an outcome 
of Monnet project. Is it also the definitive name (or a temporary name) we 
are giving to the model we are developing in this community group? I?ve 
been using ?ontolex:? as a fictitious prefix in my examples, and just got 
?lemon? was being used by some of you, because those of you working on 
Monnet have started right from examples they already built in the original 
lemon. Sorry for asking what seems to be trivial, but I never got any 
definitive statement on this, so, better to realign late than never :-D 
Btw, what is written at the last row of: http://www.lemon-model.net/ seems 
to confirm my hypothesis.

ok..back to the original topic. Consider that a few of these observations 
can actually be solved by completing the example, and do not necessarily 
clash with it (or, at least, do not clash with what has been already 
written, while I don?t know of what was thought for the rest).
 
2)      With respect to Wordnet (which has explicitly ordered senses per 
word, where I think this order originates ? at least for some of the words 
? from frequencies in SemCor) the sense ordering is lost: the synsets are 
bound to the words by means of the sole listing of values, which in plain 
RDF is unordered.
 
3)      This is the most important observation: the use of lemon:sense . 
Together with lemon:reference, lemon:sense should realize the bridge from 
lexical entries to conceptual entities (of the domain ontology). Should we 
use it reach the conceptual entities (e.g. synsets) of the lexical 
resource AS WELL?. In terms of black-box compatibility, as we are 
modelling even conceptual info of lexical resources (e.g.  synsets in 
wordnet) through some RDF language (e.g. SKOS), the thing is legal (the 
rdfs:range of lemon:sense, providing it is wide enough, is respected), 
still I?m not sure we want that. Shortly, I?m not sure if we want to apply 
exactly the same 3-entities approach we are using for the lexicon-ontology 
model, to modelling solely a lexical resource.
Let?s make an example. We have myont: which is a domain ontology (where we 
have the entry myont:vomit) we are enriching with lexical content, 
possibly from wordnet. Then we have the necessity of representing a direct 
linking between some lexical entries (which may happen to be in wordnet or 
not) and the domain entities of myont.
We would have thus this example, which I derived from both the WordNet 
example, and the generic OntoLex example for enriching an ontology with 
lexical content: 

<cat:v>
               a lemon:LexicalEntry
               lemon:sense <cat::2:29:0::>, <cat::2:35:0::> ;
<cat::2:29:0::>
               a lemon:LexicalSense ;
               lemon:reference <VerbSynset76400> .
               lemon:reference myont:vomit .
                               
Note that I?ve cut from the original example, the triples which are 
non-useful to the discussion.
Actually, in writing this revised example, I?m not even sure if the two 
lemon:references should be put under the same sense umbrella, or I should 
have used two different senses. This is mainly because I?m not sure about 
the concept of ?sense? here and what it represents. I see potential for 
confusion even by looking at the Elena/John emails, as she rightly asks 
about the use of skos:definition instead of lemon:definition. While I?m 
not addressing here the use of a property or the other, the answer by 
John, hinting at the fact that there could be two definitions, one for a 
sense, and one for a synset (and consider that there could be a definition 
for the element in the ontology), makes me wonder how many levels we 
should have!
Without delving too much in the appropriateness of this indirection for 
what concerns the lexicon-ontology interface, and considering the sole 
context of the representation of Wordnet (thus just the lexicon 
perspective), to me the path from the LexicalEntry to the Synset is too 
long. In wordnet we just say that a word is linked to a synset: period 
(modulo the addition of an ordering). In particular, ?sense? is a relation 
which just tells me that synsetX is the i-th sense of word Y (and there?s 
a many-to-many rel between words and synsets).

?and this brings me back to our first discussions about the choice of the 
term sense, when referring to the path from lexical entries to ontology 
elements and about the nature of ?elements-in-the-middle?.
In my view (to avoid terminological problems, I focus here on the path 
between entities, and do not name the linking properties at all, so pls 
consider all the arrows here have properties behind, in particular 
lemon:sense and lemon:reference), when considering a mapping between a 
lexical resource such as Wordnet, and an ontology, I would have seen such 
a path:
LexicalEntry --> Synset --> OntologyResource
where, without using WordNet, the path would have been:
LexicalEntry --> [] --> OntologyResource
with [] a blanknode creating this gluing between them.
The second line is identical to what we have done until now and what has 
been written in the examples in the ?Specification of 
Requirements/Lexicon-Ontology-Mapping?. In particular, the blanknode is an 
instance of that element-in-the-middle (see: ?Need for an object between 
Lexical Entry and Ontology?) which still has not a name (and maybe it does 
not need to have, see point 4 below). The first line is thus my 
interpretation of how WordNet would have fit into that general template 
(different from John?s example).
So, my idea would be to not replicate the complex lexicon-ontology linking 
inside WordNet itself, and have instead a direct linking between lexical 
entries and Synsets, and have THEN, outside of WordNet, a further link to 
an ontology element. If you look at the two rows above (and how the 
WordNet case fits the general case), this is pretty elegant, and does not 
introduce a further level of indirection which appears not necessary. 
Plus, with this method, the link from synsets to ontology elements is a 
necessary step to instantiate the path above, while in the other case, you 
should introduce it as an additional (and probably redundant) triple. You 
can see it in fact in the turtle code above, which I modelled following 
both the general example in ?Specification of 
Requirements/Lexicon-Ontology-Mapping? and John?s example on WordNet: 
there, VerbSynset is a separate entity from myont:vomit. Actually, in that 
view, WordNet would become a separate ?ontology? which could then be 
mapped to a domain ontology, instead of taking all the benefit of being 
seen as a lexical resource that can be used, seamlessly within our model, 
to enrich a domain ontology.

4)      IMHO, we should coin a specific vocabulary for each element of the 
lexicon model, and then inherit (where appropriate) from SKOS/SKOSXL, to 
distinguish such elements which belong only to a lexical resource from 
those of any generic KOS. In the wiki, John wonders if what I called 
?SemanticIndex? is not a skos:Concept, and I reply: ?yes it is, in fact my 
proposal is that our vocabulary for describing lexical resources can 
inherit from the SKOS/SKOS-XL one?. If you look at the example, even John 
did this, as the LexicalForm is nothing different from a skosxl:Label 
(where lemon:writtenRep could be replaced by skosxl:literalForm) though it 
may be worth creating a dedicated class. I would thus suggest:
LexicalForm rdfs:subClassOf skosxl:Label 
but to use skosxl:literalForm instead of lemon:writtenRep

maybe, in this specific case, we can even not reinvent a name, and totally 
reuse the skosxl:Label, which after all is not so bad and pretty fitting 
our necessities? (as it is already related to something specifically 
thought for language).

On the contrary, for LLD, I would necessarily restrict the class 
skos:Concept to the class of elements which we expect to host things like 
the WordNet Synset class. You can see my sample extension-point above in 
the wiki (?Examples of Modelling in RDF (Alternative approach)?), though 
by now mean I suggest <SemanticIndex> (that was a placeholder, taken from 
a previous work), but in any case I think ?Sense? is not appropriate 
(lemon:sense well evokes the sense relation, while I don?t like to see a 
class of ?Senses?, that is, to me being a sense is more a role in a given 
relationship, than a intrinsic property of an object).

a.       While I think that a more-specific-than-skos:Concept class would 
be welcome for Lexical Linked Data (such as WordNet), and thus put in the 
middle of the: LexicalEntry --> ??? --> OntologyResource  template, I?m 
not sure that the lemon:sense (first arrow) should be necessarily 
restricted to it. John?s use of skos:Concept in the middle suggested me 
that even a generic well-lexicalized KOS could be used for providing 
LexicalEntries and Senses to enrich an ontology. However, I?m still 
thinking about it?

5)      Another thing which comes to my mind, quite out of the WordNet 
example, but not without consequences for it... What should be, in 
general, the expected modelling behaviour when we have two terms which 
coincide, but the syntactic use of which can follow different paths?
E.g., suppose we have a term with three senses. In the context of these 
senses, with two of them (say 1 and 2), the term has exactly identical 
variations (declensions for nouns pronouns and adjectives and conjugations 
for verbs ), and maybe other information in common (think about 
etymology!), while for the third sense, this may show differences in the 
variations (e.g. a noun would have a different plural form, or a verb has 
a different form in only one tense, when used with that sense). Should we 
model them as 3 different lexical units, or should we agglomerate the two 
identical ones into one LexicalEntry, and link it to senses 1 and 2?
This seems to be not related to modeling WordNet in the specific, because 
variations, declinations etc.. are out of WordNet. However, this may 
affect a model trying to reuse WordNet enriched with further information? 
Thus it?s important when we consider how a WordNet modelling could be 
ported inside an extended framework with no risk of inconsistency.

I just thought about a solution for this: if we allow for skosxl:Labels to 
be directly attached to Synsets (or whatever it is the superclass for 
them), and then we state the following rule:
LexicalEntry -> lemon:canonicalForm -> skosxl:Label
LexicalEntry -> lemon:sense -> <asynset>
------------------------------------
skosxl:Label -> ???:sense (whatever it is called) -> <asynset>

this would allow for the complex structure we expect in general, but also 
allow for a more neutral fit of WordNet. In fact, instead of having the 
third triple as inferred, for WordNet we could just explicitly mention the 
third one, and do not put potentially compromising information (which, in 
any case, is out of WordNet, as noted by John in his reply to Elena).
The ????:sense (whatever it is called)? could even be lemon:sense itself, 
providing that its range is LexicalEntry+skosxl:Label.
However, I still have to think more about that?
 
One more thing, observation in point 2 above made me think once more that 
we should be clearer in our objectives:
Fact: since we have to model ontology-lexicon interfaces, and there isn?t 
much out there for representing lexical info (limited to RDF, I mean); we 
have thus to provide a model for the linguistic part, before ?attaching? 
it to the ontology part. Now, the objective could be:
 
1)      We want to model lexical knowledge, and we give a model for this. 
WordNet may be (in part) more fine grained than our model?no big trouble, 
WordNet is WordNet, and our model is our model? we?ll be missing those 
details..
a.       A slightly different interpretation of the above: we want to 
model lexical knowledge, AND we decide WordNet IS the model (at least for 
the monolingual word-description needs..I leave out FrameNet et similia 
from this context of discussion). No big deal with other alternative 
resources to WordNet..
2)      We want to model existing lexical resources. Thus WordNet, as well 
as other resources (maybe differently organized) are all important
 
Obviously, there are endless colours in the middle of the above, as we 
could be in case 1 or 2, and still think WordNet is so important that it 
has to be fully covered (also because, in this way, Princeton could decide 
to natively output each new release of WordNet in RDF too according to our 
model).
 
Cheers,
 
Armando
 
P.S: I?ve brought a couple of small fixes to the page: 
http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Lexicon-Ontology-Mapping#Summary_on_Requirements_on_the_Lexicon-Ontology-Mapping_.28Synthesis_by_PC.29 
which we already discussed 2 or 3 meetings ago.
 
 
From: johnmccrae@gmail.com [mailto:johnmccrae@gmail.com] On Behalf Of John 
McCrae
Sent: venerdì 12 aprile 2013 16.10
To: public-ontolex
Subject: WordNet modelling in Lemon and SKOS
 
Hi all,
 
Here is the proposed modelling of WordNet as lemon and SKOS (using 
skos:Concept for synsets)
 
http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Linked_Data#Example:_WordNet_as_lemon-SKOS
 
Any comments?
 
Regards,
John


-- 
Prof. Dr. Philipp Cimiano
Semantic Computing Group
Excellence Cluster - Cognitive Interaction Technology (CITEC)
University of Bielefeld

Phone: +49 521 106 12249
Fax: +49 521 106 12412
Mail: cimiano@cit-ec.uni-bielefeld.de

Room H-127
Morgenbreede 39
33615 Bielefeld

IBM Italia S.p.A.
Sede Legale: Circonvallazione Idroscalo - 20090 Segrate (MI) 
Cap. Soc. euro 347.256.998,80
C. F. e Reg. Imprese MI 01442240030 - Partita IVA 10914660153
Società con unico azionista
Società soggetta all?attività di direzione e coordinamento di 
International Business Machines Corporation

(Salvo che sia diversamente indicato sopra / Unless stated otherwise 
above)
Received on Tuesday, 16 April 2013 08:34:27 UTC