Re: Senses, synsets and ontology mapping in WordNet from Aldo Gangemi on 2013-04-16 (public-ontolex@w3.org from April 2013)

From: Aldo Gangemi <aldo.gangemi@cnr.it>
Date: Tue, 16 Apr 2013 12:43:47 +0200
To: John McCrae <jmccrae@cit-ec.uni-bielefeld.de>, Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>, Armando Stellato <stellato@info.uniroma2.it>
Cc: Aldo Gangemi <aldo.gangemi@cnr.it>, public-ontolex <public-ontolex@w3.org>
Message-Id: <A0E2F282-91F2-431A-BED7-C7DB95E08248@cnr.it>
Hi all. It seems that our old and long discussion on senses and references was not actually completed, or that some clarification is still needed. So sorry for going back to basics: please provide your (dis)agreement to my views, so that we can all refer to a same intuition.

First, I suggest to distinguish the topics as follows:

1) the ontology of WordNet and its role in OntoLex 
2) the alignment between WordNet and SKOS ontologies
3) the issue of what is a reference relation/object in OntoLex

Re: (1)

WordNet ontology has been investigated several times, and a pragmatic representation has been agreed in a W3C Task Force, and heavily used since then (2005). Both WordNet 2.0 and WordNet 3.0 in RDF are currently using that vocabulary, and extensions have been provided e.g. for standoff data, domains, etc.
The obvious strategy here would be to have corresponding OntoLex classes and properties for each WordNet ontology class or property that seems relevant to OntoLex. For example, the distinction between synsets and senses is quite universally accepted, and despite the occasional disputes on its interpretation, it is pretty clear when semiotics is taken into account: given the triangle expression-meaning-reference (a sign), a sense is a meaning expressed by exactly one expression, while a synset is an "equivalence class of senses", i.e. it is a meaning expressed by one or more expressions. There is a case when a sense and a synset are semiotically (but not pragmatically) identical: monosemous expressions. In polysemous cases, a synset has senses as "parts" or "members" (or any other relation name we find clear to everyone).
Therefore:
 wordnet:WordSense rdfs:subClassOf ontolex:LexicalSense
 wordnet:Synset rdfs:subClassOf ontolex:LexicalSense
 wordnet:wordsense-vomit-verb-1 rdf:type wordnet:WordSense
 wordnet:synset-vomit-verb-1 rdf:type wordnet:Synset

Re: (2)

WordNet ontology has lexical entities and their senses in its domain of discourse. SKOS ontology has firstly KOS (Knowledge Organisation Systems) entities in its domain of discourse, often called "concepts", which can have "labels". The analogy between lexical and KOS semantics has been spotted several times, and SKOS-XL tries to abridge the two worlds by treating all lexical items as a "Label". Indeed the broader/hypernym relations have a similar informal (≈intensional) semantics, and formal semantics adopted in most logics is perfectly fine with treating those intensional entities as the same stuff. If we accept that, the obvious consequence is to consider SKOS concepts as more general than WordNet senses and synsets, with all the good outcomes of this assumption. 
On the other hand, I do not find it correct in principle to use ontolex:reference to link WordNet senses/synsets to SKOS concepts. The reason is again semiotic, and goes to the isse (3).
Therefore:
 ontolex:LexicalSense rdfs:subClassOf skos:Concept
 wordnet:hypernym rdfs:subPropertyOf skos:broader
 wordnet:wordsense-vomit-verb-1 skos:equivalentMapping myont:Vomit

Re: (3)

Intensional entities like WordNet senses/synsets and SKOS concepts are all meanings in the semiotic triangle, which shapes intuitions now current in most linguistics, logic and philosophy: animals use materialized symbols (expressions, manifestations) with some intension (meanings, senses) to refer to things, facts, or collections of things/facts in the world they are considering. Now, when wordnet:word-vomit has a sense <vomit%2:29:0::>, and a synset wordnet:synset-vomit-verb-1, we are still in the relation between expressions and meanings: reference relations are not treated at all by WordNet (except for the instanceOf relation introduced from version 2.1). 
Similarly for SKOS: the interpretation of e.g. myskos:Vomit concept is purely intensional: there is nothing (at least explicitly) that points to "references" in terms of extensional semantics (things, facts, collections of things/facts).
This is not the case when we want to link lexical or KOS meanings to typical ontologies, e.g. to myont:Vomit class. If Vomit is an OWL (or RDFS) class, its interpretation is *extensional* (a collection of things, vomiting events in the common interpretation), therefore it's fully justified to use ontolex:reference for representing this linking.
Therefore:
 wordnet:wordsense-vomit-verb-1 wordnet:inSynset wordnet:synset-vomit-verb-1
 wordnet:wordsense-vomit-verb-1 ontolex:reference myont:Vomit
 wordnet:synset-vomit-verb-1 ontolex:reference myont:Vomit

Bottom line: I thought that the OntoLex model was substantially conforming – thanks to the LexicalEntry/LexicalSense/Reference distinction – to mainstream semiotic theory, one that can generalize over lexical semantics, KOS, and logic. Maybe it is not the case after all ... however this should be discussed IMO, since it'd be a great help when taking decisions on what/how to import or replicate from existing lexical ontologies. As a historical example, in 2008 I and Alfio Gliozzo (now at IBM Watson) have used in an industrial context LMM [1], a semiotics-based ontology, to align all sorts of intensional and extensional knowledge, with very good results. The essence of LMM is also summarized in an ontology pattern, semiotics.owl [2]. If you think it appropriate, I can bring that use case in more explicitly.

Ciao
Aldo

[1] http://www.ontologydesignpatterns.org/ont/lmm/LMM_L2.owl
[2] http://www.ontologydesignpatterns.org/cp/owl/semiotics.owl

On Apr 16, 2013, at 10:58:19 AM , John McCrae <jmccrae@cit-ec.uni-bielefeld.de> wrote:

> Firstly, I think an important point here is that WordNet does in fact have senses as a concept distinct from Synsets and Words. These senses have their own identifiers and are used for stand-off annotation in WordNet
> 
> http://wordnet.princeton.edu/wordnet/man/senseidx.5WN.html
> 
> It seems clear that this index scheme should be preserved as well as, in the sense that there should be a named resource (URI) for each sense index (bar the obvious technical limitation that we can't use % in URIs like in the original schema).
> 
> Therefore the chain in the model should be at least
> 
> <cat:v> -> <cat%2:29:0::> -> <VerbSynset76400>
> Entry -> Sense -> Synset
> 
> How we then go to the ontology is an open question
> 
> We could then add a further (4th) link onto an ontology, or link from the same sense as a second reference.
> 
> The issue really seems to resolve around how close the concept of a synset or SKOS concept hierarchy corresponds to that of an ontology. In my opinion I would argue that the distinction is really in precision not level. By which I mean at the semiotic level that is they both represent the reference in the sense of the abstract concept of "cat" in the same way that both "cat", "felis catus", "Katze", "gato" and "猫" can refer to the same set of entities in the world. The key difference is that the synset hierarchy does not constitute a logic program like an ontology does, in the sense that the hypernym relations are not transitive and key distinctions between classes, individuals and qualities are not made.
> 
> This understanding would lead me to conclude that the modelling of both the synset and ontology hierarchy at the same "level" (i.e., both as reference of the sense) with a different "model" (SKOS vs. OWL) is most appropriate.
> 
> Regards,
> John
> 
> 
> On Tue, Apr 16, 2013 at 10:24 AM, Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de> wrote:
> Armando, all,
> 
>  re point 3: about whether to use our three-entity path for modelling WordNet
> 
> Concerning the path word - sense - concept (to simplify things a lot).
> 
> Sense represents the meaning of "word" when understood as referring to concept "concept".
> Technically, we agreed that sense represents a reification of the pair of word/concept and thus allows to attach
> information that relates to the pair rather than to the word or the concept.
> 
> Now turning to the modelling of WordNet:
> 
> I agree that in some sense the three-entity path seems an overkill for modelling WordNet. But I think that our goal should be to design a model to works for all cases and not tune the model to the particular case of WordNet. So I would prefer to use the same modelling (i.e. the three-entity path) across all specific resources.
> 
> Assuming that WordNet contains a conceptualization, each synset indeed represents a skos:Concept (a unit of thought) and in that sense it seems reasonable to see a Synset as a reference.
> 
> WordNet provides a definition (gloss) for each synset, which has an extension in terms of senses that constitute this synset.
> WordNet does not define in more detail the meaning of a particular word/sense that belongs to a synset, but it could in principle.
> So just because WordNet does not distinguish in more detail the specific meaning of words when referring to the concept represented by the synset our model should not be limited in this sense I think.
> 
> Further, using the "sense" object will allow us to order the senses in terms of frequency of usage for each word. Note that this order is exactly one of these attributes that neither applies to the word nor to the synset, but to the pair of word and synset and would be most naturally attached to our sense object.
> 
> Finally, concerning your specific example, it could be modelled as follows:
> 
> <cat:v>
>                a lemon:LexicalEntry
>                lemon:sense <cat::2:29:0::>, <cat::2:35:0::> , <cat_new>
> 
> 
> <cat::2:29:0::>
>                a lemon:LexicalSense ;
>                lemon:reference <VerbSynset76400> .
> 
> <cat_new>
>                a lemon:LexicalSense;
>                lemon:reference myont:vomit .
> 
> 
> So in some sense, both myont:vomit (btw. why "vomit"? ;-)) and <VerbSynset76400> have the same status in the lexicon as a concept (skos:Concept) that can be the reference for a word. I do not see any problem with that. Of course, this does not say anything about the further status or relation between these two concepts.
> 
> Enough for now on this point I think.
> 
> Does this all make sense?
> 
> Philipp.
> 
> Am 15.04.13 19:41, schrieb Armando Stellato:
>> Hi all,
>> 
>> First of all, thanks John for providing the example: through concrete examples it is easier to discuss!
>> 
>>  
>> 
>> A few comments (the same “disclaimer” from Elena holds for me: hope I didn’t miss anything from other discussions, and in case, sorry in advance).
>> 
>>  
>> 
>> 1)      First of all (sorry a bit out of topic), I would ask for a clarification, so that I can apply the policy to my examples too: I see the “lemon:” prefix being used in many examples, and Lemon is an outcome of Monnet project. Is it also the definitive name (or a temporary name) we are giving to the model we are developing in this community group? I’ve been using “ontolex:” as a fictitious prefix in my examples, and just got “lemon” was being used by some of you, because those of you working on Monnet have started right from examples they already built in the original lemon. Sorry for asking what seems to be trivial, but I never got any definitive statement on this, so, better to realign late than never :-D 
>> Btw, what is written at the last row of: http://www.lemon-model.net/ seems to confirm my hypothesis.
>> 
>> 
>> ok..back to the original topic. Consider that a few of these observations can actually be solved by completing the example, and do not necessarily clash with it (or, at least, do not clash with what has been already written, while I don’t know of what was thought for the rest).
>> 
>>  
>> 
>> 2)      With respect to Wordnet (which has explicitly ordered senses per word, where I think this order originates – at least for some of the words – from frequencies in SemCor) the sense ordering is lost: the synsets are bound to the words by means of the sole listing of values, which in plain RDF is unordered.
>> 
>>  
>> 
>> 3)      This is the most important observation: the use of lemon:sense . Together with lemon:reference, lemon:sense should realize the bridge from lexical entries to conceptual entities (of the domain ontology). Should we use it reach the conceptual entities (e.g. synsets) of the lexical resource AS WELL?. In terms of black-box compatibility, as we are modelling even conceptual info of lexical resources (e.g.  synsets in wordnet) through some RDF language (e.g. SKOS), the thing is legal (the rdfs:range of lemon:sense, providing it is wide enough, is respected), still I’m not sure we want that. Shortly, I’m not sure if we want to apply exactly the same 3-entities approach we are using for the lexicon-ontology model, to modelling solely a lexical resource.
>> Let’s make an example. We have myont: which is a domain ontology (where we have the entry myont:vomit) we are enriching with lexical content, possibly from wordnet. Then we have the necessity of representing a direct linking between some lexical entries (which may happen to be in wordnet or not) and the domain entities of myont.
>> We would have thus this example, which I derived from both the WordNet example, and the generic OntoLex example for enriching an ontology with lexical content: 
>> 
>> <cat:v>
>>                a lemon:LexicalEntry
>>                lemon:sense <cat::2:29:0::>, <cat::2:35:0::> ;
>> <cat::2:29:0::>
>>                a lemon:LexicalSense ;
>>                lemon:reference <VerbSynset76400> .
>>                lemon:reference myont:vomit .
>>                               
>> 
>> Note that I’ve cut from the original example, the triples which are non-useful to the discussion.
>> 
>> Actually, in writing this revised example, I’m not even sure if the two lemon:references should be put under the same sense umbrella, or I should have used two             different senses. This is mainly because I’m not sure about the concept of “sense” here and what it represents. I see potential for confusion even by looking at the Elena/John emails, as she rightly asks about the use of skos:definition instead of lemon:definition. While I’m not addressing here the use of a property or the other, the answer by John, hinting at the fact that there could be two definitions, one for a sense, and one for a synset (and consider that there could be a definition for the element in the ontology), makes me wonder how many levels we should have!
>> Without delving too much in the appropriateness of this indirection for what concerns the lexicon-ontology interface, and considering the sole context of the representation of Wordnet (thus just the lexicon perspective), to me the path from the LexicalEntry to the Synset is too long. In wordnet we just say that a word is linked to a synset: period (modulo the addition of an ordering). In particular, “sense” is a relation which just tells me that synsetX is the i-th sense of word Y (and there’s a many-to-many rel between words and synsets).
>> 
>> …and this brings me back to our first discussions about the choice of the term sense, when referring to the path from lexical entries to ontology elements and about the nature of “elements-in-the-middle”.
>> In my view (to avoid terminological problems, I focus here on the path between entities, and do not name the linking properties at all, so pls consider all the arrows here have properties behind, in particular lemon:sense and lemon:reference), when considering a mapping between a lexical resource such as Wordnet, and an ontology, I would have seen such a path:
>> LexicalEntry --> Synset --> OntologyResource
>> where, without using WordNet, the path would have been:
>> LexicalEntry --> [] --> OntologyResource
>> with [] a blanknode creating this gluing between them.
>> The second line is identical to what we have done until now and what has been written in the examples in the “Specification of Requirements/Lexicon-Ontology-Mapping”. In particular, the blanknode is an instance of that element-in-the-middle (see: “Need for an object between Lexical Entry and Ontology”) which still has not a name (and maybe it does not need to have, see point 4 below). The first line is thus my interpretation of how WordNet would have fit into that general template (different from John’s example).
>> So, my idea would be to not replicate the complex lexicon-ontology linking inside WordNet itself, and have instead a direct linking between lexical entries and Synsets, and have THEN, outside of WordNet, a further link to an ontology element. If you look at the two rows above (and how the WordNet case fits the general case), this is pretty elegant, and does not introduce a further level of indirection which appears not necessary. Plus, with this method, the link from synsets to ontology elements is a necessary step to instantiate the path above, while in the other case, you should introduce it as an additional (and probably redundant) triple. You can see it in fact in the turtle code above, which I modelled following both the general example in “Specification of Requirements/Lexicon-Ontology-Mapping” and John’s example on WordNet: there, VerbSynset is a separate entity from myont:vomit. Actually, in that view, WordNet would become a separate “ontology” which could then be mapped to a domain ontology, instead of taking all the benefit of being seen as a lexical resource that can be used, seamlessly within our model, to enrich a domain ontology.
>> 
>> 
>> 4)      IMHO, we should coin a specific vocabulary for each element of the lexicon model, and then inherit (where appropriate) from SKOS/SKOSXL, to distinguish such elements which belong only to a lexical resource from those of any generic KOS. In the wiki, John wonders if what I called “SemanticIndex” is not a skos:Concept, and I reply: “yes it is, in fact my proposal is that our vocabulary for describing lexical resources can inherit from the SKOS/SKOS-XL one”. If you look at the example, even John did this, as the LexicalForm is nothing different from a skosxl:Label (where lemon:writtenRep could be replaced by skosxl:literalForm) though it may be worth creating a dedicated class. I would thus suggest:
>> LexicalForm rdfs:subClassOf skosxl:Label 
>> but to use skosxl:literalForm instead of lemon:writtenRep
>> 
>> maybe, in this specific case, we can even not reinvent a name, and totally reuse the skosxl:Label, which after all is not so bad and pretty fitting our necessities… (as it is already related to something specifically thought for language).
>> 
>> On the contrary, for LLD, I would necessarily restrict the class skos:Concept to the class of elements which we expect to host things like the WordNet Synset class. You can see my sample extension-point above in the wiki (“Examples of Modelling in RDF (Alternative approach)”), though by now mean I suggest <SemanticIndex> (that was a placeholder, taken from a previous work), but in any case I think “Sense” is not appropriate (lemon:sense well evokes the sense relation, while I don’t like to see a class of “Senses”, that is, to me being a sense is more a role in a given relationship, than a intrinsic property of an object).
>> 
>> 
>> a.       While I think that a more-specific-than-skos:Concept class would be welcome for Lexical Linked Data (such as WordNet), and thus put in the middle of the: LexicalEntry --> ??? --> OntologyResource  template, I’m not sure that the lemon:sense (first arrow) should be necessarily restricted to it. John’s use of skos:Concept in the middle suggested me that even a generic well-lexicalized KOS could be used for providing LexicalEntries and Senses to enrich an ontology. However, I’m still thinking about it…
>> 
>> 
>> 5)      Another thing which comes to my mind, quite out of the WordNet example, but not without consequences for it... What should be, in general, the expected modelling behaviour when we have two terms which coincide, but the syntactic use of which can follow different paths?
>> E.g., suppose we have a term with three senses. In the context of these senses, with two of them (say 1 and 2), the term has exactly identical variations (declensions for nouns pronouns and adjectives and conjugations for verbs ), and maybe other information in common (think about etymology!), while for the third sense, this may show differences in the variations (e.g. a noun would have a different plural form, or a verb has a different form in only one tense, when used with that sense). Should we model them as 3 different lexical units, or should we agglomerate the two identical ones into one LexicalEntry, and link it to senses 1 and 2?
>> This seems to be not related to modeling WordNet in the specific, because variations, declinations etc.. are out of WordNet. However, this may affect a model trying to reuse WordNet enriched with further information… Thus it’s important when we consider how a WordNet modelling could be ported inside an extended framework with no risk of inconsistency.
>> 
>> I just thought about a solution for this: if we allow for skosxl:Labels to be directly attached to Synsets (or whatever it is the superclass for them), and then we state the following rule:
>> LexicalEntry -> lemon:canonicalForm -> skosxl:Label
>> LexicalEntry -> lemon:sense -> <asynset>
>> ------------------------------------
>> skosxl:Label -> ???:sense (whatever it is called) -> <asynset>
>> 
>> this would allow for the complex structure we expect in general, but also allow for a more neutral fit of WordNet. In fact, instead of having the third triple as inferred, for WordNet we could just explicitly mention the third one, and do not put potentially compromising information (which, in any case, is out of WordNet, as noted by John in his reply to Elena).
>> The “???:sense (whatever it is called)” could even be lemon:sense itself, providing that its range is LexicalEntry+skosxl:Label.
>> However, I still have to think more about that…
>> 
>>  
>> 
>> One more thing, observation in point 2 above made me think once more that we should be clearer in our objectives:
>> 
>> Fact: since we have to model ontology-lexicon interfaces, and there isn’t much out there for representing lexical info (limited to RDF, I mean); we have thus to provide a model for the linguistic part, before “attaching” it to the ontology part. Now, the objective could be:
>> 
>>  
>> 
>> 1)      We want to model lexical knowledge, and we give a model for this. WordNet may be (in part) more fine grained than our model…no big trouble, WordNet is WordNet, and our model is our model… we’ll be missing those details..
>> 
>> a.       A slightly different interpretation of the above: we want to model lexical knowledge, AND we decide WordNet IS the model (at least for the monolingual word-description needs..I leave out FrameNet et similia from this context of discussion). No big deal with other alternative resources to WordNet..
>> 
>> 2)      We want to model existing lexical resources. Thus WordNet, as well as other resources (maybe differently organized) are all important
>> 
>>  
>> 
>> Obviously, there are endless colours in the middle of the above, as we could be in case 1 or 2, and still think WordNet is so important that it has to be fully covered (also because, in this way, Princeton could decide to natively output each new release of WordNet in RDF too according to our model).
>> 
>>  
>> 
>> Cheers,
>> 
>>  
>> 
>> Armando
>> 
>>  
>> 
>> P.S: I’ve brought a couple of small fixes to the page: http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Lexicon-Ontology-Mapping#Summary_on_Requirements_on_the_Lexicon-Ontology-Mapping_.28Synthesis_by_PC.29 which we already discussed 2 or 3 meetings ago.
>> 
>>  
>> 
>>  
>> 
>> From: johnmccrae@gmail.com [mailto:johnmccrae@gmail.com] On Behalf Of John McCrae
>> Sent: venerdì 12 aprile 2013 16.10
>> To: public-ontolex
>> Subject: WordNet modelling in Lemon and SKOS
>> 
>>  
>> 
>> Hi all,
>> 
>>  
>> 
>> Here is the proposed modelling of WordNet as lemon and SKOS (using skos:Concept for synsets)
>> 
>>  
>> 
>> http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Linked_Data#Example:_WordNet_as_lemon-SKOS
>> 
>>  
>> 
>> Any comments?
>> 
>>  
>> 
>> Regards,
>> 
>> John
>> 
> 
> 
> -- 
> Prof. Dr. Philipp Cimiano
> Semantic Computing Group
> Excellence Cluster - Cognitive Interaction Technology (CITEC)
> University of Bielefeld
> 
> Phone: 
> +49 521 106 12249
> 
> Fax: 
> +49 521 106 12412
> 
> Mail: 
> cimiano@cit-ec.uni-bielefeld.de
> 
> 
> Room H-127
> Morgenbreede 39
> 33615 Bielefeld
> 
>
Received on Tuesday, 16 April 2013 10:44:18 UTC