R: WordNet modelling in Lemon and SKOS

Hi Aldo,

Ok, while i disagreed with John's version of the binding to semiotics (second row of its graphical resume), by looking at the scheme you provided, I totally agree with it, and I believe it is exactly matching my one (and in fact your one is the original lemon one, bound to the semiotics.owl), modulo the terminological choice of LexicalSense vs LexicalConcept. Ah, for "my one", pls consider the correct version  I provided in my email. Regarding the terminological difference, I explained my choice, as of:
"The sense of a word is a LexicalConcept", so in this case I use the word sense only to express the relation between an expression and a meaning, while I "save" the word Sense to reify that same sense relation, if it is needed...and it is actually needed, at least internally to a resource like Wordnet, to represent things such as tagCount, that is the frequency of a word with a particular sense in SemCor. Now, I think you agree this reified relationship is not a Meaning (whatever we call it, LexConcept or LexSense) but it is just what it is, the reification of the <word,synset> pair. In Wordnet, the Meaning is conveyed by synsets (their words) and senses are mere relationships.
Now, I swear I won't insist with the wordnettian adoption of the term LexicalConcept :) ( and consider that I suggested other possibilities, such as Meaning itself)
though:
1) let me continue with it at least in this email, to the purpose of avoiding an overload of the term "sense"
2) given what I said above, pls re-consider my introduction of Sense only to the purpose of that reification (which may be of interest only in the internal description of a ling resource, and replaced by the direct property :sense in all ordinary cases). in this sense im interested in your opinion.
3) given the (I think) perfect match of our proposed models, I go back to the focus of my discussion: the original criticism was on how WordNet was mapped.
For this, I would kindly ask if you could give a look at my email sent on 17th. Its pretty long, but you may skip some parts and go a bit over the section where im addressing your suggestion on this mapping. The text you can find is:
"With a slight difference approach from Philipp and John, I see interestingly that Aldo proposed...". Again, much interested in your perspective here.

Best,

Armando 



----- Messaggio originale -----
Da: "Aldo Gangemi" <aldo.gangemi@cnr.it>
Inviato: ‎19/‎04/‎2013 01.14
A: "John McCrae" <jmccrae@cit-ec.uni-bielefeld.de>
Cc: "Aldo Gangemi" <aldo.gangemi@cnr.it>; "Armando Stellato" <stellato@info.uniroma2.it>; "Philipp Cimiano" <cimiano@cit-ec.uni-bielefeld.de>; "public-ontolex" <public-ontolex@w3.org>
Oggetto: Re: WordNet modelling in Lemon and SKOS

Hi John, I missed this when answering the other email. Just a few more clarifications about the schemas you provide, of course from my semiotic perspective. I remark that my attempts here are about a simplification of these matters.


lemon-skos-owl diagram:


from the point of view of semiotic relations, I'd rather put a mapping relation between a lexical sense and a skos concept, since they would both be (primarily) intensions, i.e. meanings. I do not understand skos:it


Armando's proposal diagram:


why do we need a lexical concept separated from a sense? I understand your point about WordNet designers' claims, but each designer of a lexical or linguistic resource tends to put its own philosophical view on vaguely defined notions like "concept", "meaning", etc. Provided that none has the authority to state the last word about those notions, I still think that basic distinctions between expressions, meaning (intension), and reference (extension) is something much less vague. Therefore I suggest to avoid resorting to "concepts": these are just another name for intensional entities, exactly like senses, meanings, etc.


John's diagrammatic rendering of semiotics.owl:


it's not quite what the model intends … it's ok to say that a lexical entry has a lexical sense, and that a lexical sense denotes an ontology entity (extensionally viewed), but the rest is not correct, because a lexical sense (as meaning) is not an expression, and cannot express meanings. 


Please find here attached a diagram that tries to put together those proposed by you, from the perspective of semiotics.owl, where the semiotic triangle is used as a sort of "foundational ontology" for the ontolex classes and relations.


Aldo







On Apr 18, 2013, at 4:42:31 PM , John McCrae <jmccrae@cit-ec.uni-bielefeld.de> wrote:


Hi Armando, all,


I will try to synthesize a few other emails into this reply.


Firstly, I agree with much of what of Armando says. Although lexical senses may be a reification of the <Word,Synset> combo as Armando says, I feel this understates the importance of their role. In fact, from my understanding lexical senses constitute an extension of words used with a given meaning, by the same logic that a lexical entry (lexeme) consists of an extension of words used in various inflected form. By the converse it could be argued that the lexeme is therefore just a reification of the <Form,Concept> pair (in fact this approximately what a SKOS-XL label is). The key aspect is that is it useful in at least a significant percentage of language resources, in this case, the use of lexical sense as the annotation point for contexts (register, geographical usage), conditions (lexical selection restrictions) and examples (as in WordNet, see screenshot), make it IMHO a clearly vital part of the model.


When defining lemon, we tried to be partly agnostic about the format of the ontology... we assumed it would be OWL, but didn't rule out the case of linking to F-Logic, FOL, etc. From this point-of-view it is not unreasonable  to consider linking to a SKOS concept hierarchy as an informal ontology.


Much of the issue in this thread concerns what happens if we then want to link this synset/concept hierarchy to a (formal) ontology. In the following document they propose two options:


http://www.w3.org/2006/07/SWD/SKOS/skos-and-owl/master.html



They propose "overlay" and "transform" options. I suspect most members of this list would reject the overlay option, so looking at the transform option we see a model using lemon, OWL and SKOS (first part of attached image), which uses the (unfortunately) hypothetical skos:it property to link between the concept (synset) and the ontology entity.


In a previous email today I proposed a modelling based on Aldo's semiotics.owl ontology (based on the understanding the lexical senses are expressions, synsets are meaning and ontology entities are references). As we can see this is structurally identical.


Finally, I also looked at Armando's proposal, and it also seems very similar in structure. From my opinion it should be possible to move the domain of Armando's sense link to the Sense class* and this would leave us agreeing in the structure if not the names of the labels!


Regards,
John


* Of course, if we take into account Philipp's proposed shortcut link (see http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Lexicon-Ontology-Mapping) between Lexical Entries and Ontology Entites, then this link would simply be the shortcut.



On Wed, Apr 17, 2013 at 9:28 PM, Armando Stellato <stellato@info.uniroma2.it> wrote:

Hi again,
 
First of all,  this is a reply to all three emails from Philipp, John and Aldo (plus something more from other emails). Since the topic is the same, I wrote one single reply, as there are parts of their email in common. Also, a small legenda, for being shorter later in the argumentation:
 
Ontoelement(s): those elements of an ontology which need to be referenced through lexical information, that is, the objects of triples with ontolex:reference as their predicate. Note here that there is some abuse of notation: this “target ontology” could actually be a skos concept scheme and not an owl:ontology. We do not assign any Class here, as these element could be properties, individuals, classes or concepts
3-entity-pattern: that LexicalEntry -> LexicalSense -> OntoElement structure we (more or less) agreed on.
 
Ah, one note…this is not only an interminably long discussion, I propose a model at the end :-D
 
I put here below names of people before any section, so that it is clear who said what and whom I’m replying to:
 
[Philipp]
I agree that in some sense the three-entity path seems an overkill for modelling WordNet. But I think that our goal should be to design a model to works for all cases and not tune the model to the particular case of WordNet. So I would prefer to use the same modelling (i.e. the three-entity path) across all specific resources.


[Armando]
Absolutely agree on our mandate to have something homogeneous and not hard-patched to some specific necessity. My proposed modelling for WordNet is in fact not in the direction of sprouting exceptions from our model to cover WordNet, but is actually (obviously, this is my opinion and I may be wrong) a more trustworthy replication of its structure, which I think is elegantly compatible with our model and even better matches it. Hence more, it fosters a better integration of WordNet when used to enrich an ontology.
However, my perspective is not totally incompatible with some modelling exigencies (see later my reply to John’s observations), and as you will see, some linking can be drawn up.
 
But, to argument better (at least, I hope), I have to take a step back (and sorry, I’ll be going through things that all of you know very well, but still I need to mention them for the argumentation).
 
In WordNet we have words (terms, whatever..), and these words are bound into collections called synonymy sets. To cite the most popular paper [1] about WordNet, “…synonym sets (synsets) do not explain what the concepts are; they merely signify that the concepts exist”. So, ok, synonym sets are just “language extensional hints” to a concepts. We don’t know intensionally what that concept is, but we understand there is and we know linguistically how to refer to it. From a sentence in the same paper, just before the aforementioned one, we read: “The synonym sets, {board, plank} and {board, committee} can serve as unambiguous designators of these two meanings of board”. So, meaning of boards, under an interpretative process, are designated by synsets.
>From the very first rows (the abstract) of that same publication, we read: 
“English nouns, verbs, and adjectives are organized into synonym sets, each representing one underlying lexical concept”.
 
Ok, perfect, personally, I’ve found what I would suggest for that element-in-the-middle in the 3-elements-path. It is called LexicalConcept, and fits dramatically well (even terminologically) as a subclass of skos:Concept. As I said many times, I personally didn’t like LexicalSense as, maybe exactly biased by my knowledge of WordNet, and by a bit of common sense, I would have used the word “sense”, only to represent the relationship which holds between a LexicalEntry and a LexicalConcept. That is to say: a LexicalEntry may have many senses, and each of them is represented through a pointer – through the relation: “ontolex:sense” – to a LexicalConcept, which accidentally in WordNet is a synset (not my words, I’m citing their literature).
 
Thus, recapping, in my view the thing is simple. I try to recap it as Aldo did in his email, but on my modelling perspective; therefore, to me the 3-entities-pattern (and gluing props) in our language would be:
 
Class(ontolex:LexicalEntry) –prop(ontolex:sense)–> Class(ontolex:LexicalConcept) –prop(ontolex:reference)–> An Ontoelement
 
Until now, by purely graph-matching it with what has already been said, it seems I just don’t like the LexicalSense name, and replaced it with LexicalConcept, but there’s something different exactly when we consider a case like WordNet.
Let’s take these two other triples:
 
wordnet:Synset rdfs:subClassOf ontolex:LexicalConcept
wordnet:syn_v_00076153 rdf:type wordnet:Synset
 
thus, here we have just two renamings:
-          a synset instance renaming: very personally, I think the synset code is the most “neutral way” of calling a synset, not biased by one of the terms which are part of it, which always gave me an headache; think this is the same thing Piek was referring to when talking about the choice of word-sensenumber pairs as URIs for synsets in the existing RDF version of WordNet
-          my LexicalConcept class instead of LexicalSense
but, apart from them, I took those two triples exactly as they are from Aldo’s example.
 
Now, the focus of my opposition to the original WordNet example (or better, of some implications of it which I heard as confirmed in the emails), is that I see this class LexicalConcept as exactly the “vague lexical concept” – of which we precisely know a lexical extension – which can be put in between LexicalEntries and ontoelements in the 3-entities-pattern.
It is exactly, for instance, the bnode we put in the example in: http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Lexicon-Ontology-Mapping#Examples_using_DBpedia when we write:
 
:team a ontolex:LexicalEntry ;
  ontolex:canonicalForm [ontolex:writtenRep "team"@en ; ] ;
  ontolex:sense [ontolex:reference <http://dbpedia.org/ontology/team> ;
 
to link the :team LexicalEntry to the dbpedia:team resource.
 
Only…if we are using WordNet, someone has already prepared a set of these LexicalConcepts (seasoned with words!) for us, gave identifiers to them (so no bnodes necessary), and a general class for them, calling it Synset :-)
This is really the central part of what I’m saying.
 
Thus, a very basic (but still compliant) modelling can be:
 
wordnet:syn_n_08225481 ontolex:reference <http://dbpedia.org/ontology/team> ;
 
and we get for free all the LexicalEntries already attached to WordNet, and modelled according to our vocabulary. Obviously, some other work can further enrich the lexical description of a WordNet synset (which in wordnet is just a set of words) thanks to our more fine grained vocabulary allowing for richer characterization of Lexical Entries. Still at least with one row above, we get a lot for free thanks to the mere existence of WordNet.
 
[Philipp]
Assuming that WordNet contains a conceptualization, each synset indeed represents a skos:Concept (a unit of thought) and in that sense it seems reasonable to see a Synset as a reference.
 
[Armando]
Agree on the skos:Concept part, not on the rest. WordNet is a lexical database. Its domain (the set of its linguistic concepts called synsets) is still linguistic, and the concepts of WordNet are thus IMHO these LexicalConcepts I’m advocating. If you commit somehow to WordNet, then you could (you should, in my advice) commit to (and take benefits from) using these synsets as the element-in-the-middle of our 3-entities-pattern.
I’m trying to assess WordNet in the right place of our wider onto-linguistic modelling, and I see it as the linguistic part which needs to be attached to the conceptual part. I wouldn’t like to see WordNet as a domain (world domain) concept scheme with attached labels that can be potentially mapped to our ontoelements. Obviously, the use of skos:Concept may be misleading in its name (as “concept” could induce in the thought that - in the onto-lex composition - it is the “onto” part), but I’m stressing that this extension of skos:Concept should be our ontolex:LexicalConcept, and that this ontolex:LexicalConcept itself is the right cap (superclass) for wordnet:Synset when considering WordNet as a specific instance of a Ontolex-modelable lexical resource. Finally, once more, this implies that Synsets should sit in between LexicalEntries and ontoelements in our 3-Entities-Pattern.
 
I try now to explain the contra for the example currently in the wiki. With the previous modelling, we get almost nothing back: we would have this “general world ontology” called WordNet, which has its lexical entries (mediated through the Sense entity), and we have two distinct universes of possible actions:
1)      we could map the resources of our domain ontology/conceptscheme to the synsets of WordNet, much the same way we map two general domain ontologies or concept schemes.
2)      we could relate specific wordsenses, such as: wordsense-vomit-verb-1, to resources in our ontology.


But pay attention, in what I propose we could link a synset (syn_v_00076153), through ontolex:reference, directly to ontoelements and use it - coherently with our model - to have all of that synsets lexicalentries bound to the intended ontoelement. In the current model instead, by using WordNet senses, we should link each sense of each word to the ontoelements
 
Thus we should state:
wordsense-vomit-verb-1  ontolex:reference    myont:vomit
wordsense-cat-verb-2       ontolex:reference   myont:vomit
 
but…is it not painful? We already had the synset as a common umbrella! Oh yes, surely we could decide some entailment, for which if I link (somehow..how? through skos:exactMatch?) a synset to an element of my ontologies, then all of its related wordsenses (that is, the set of senses for which certain words are bound to that synset) are bound to the ontoelements. But how to state this entailment in the general ontolex vocabulary, since Synsets are out of it? (and in fact the wiki example does not hint at any general definition of wordnet:Synset under some ontolex umbrella, being it only the last resource to be pointed by ontolex:reference, much like an ontoelement from any other ontology).
 
With a slight difference approach from Philipp and John, I see interestingly that Aldo proposed both Synset and WordSense as subclasses of ontolex:LexicalSense. This would mean that Aldo would actually allow to use synsets in the middle of our 3-elements-path


                wordnet:WordSense rdfs:subClassOf ontolex:LexicalSense
                wordnet:Synset rdfs:subClassOf ontolex:LexicalSense
 
this seems discordant from what Philipp and John say. While I obviously agree with the second axiom (it’s basically the core of what I’m saying), personally I can’t see wordnet:WordSense as well as a subclass of ontolex:LexicalSense, and, actually, can’t think how the two things (wordnet:WordSense and wordnet:Synset), which are solidly distinct, can be subclasses of the same class in any possible theory.
 
So (if I’m correct), in the case of Philipp and John, it seems Synset is left away from any convenient reuse, while in the case of Aldo, I’ve this big problem with the double subclassing of both Synset and WordSense under LexicalSense. You may not agree with me, but still it seems something is missing.


I was then trying to do the devil’s advocate and argument against myself: “what if I want to attach a given set of words to one of my ontoelements, but there is no synset in wordnet which rightly embraces it?, that is, for each synset I would consider, there is a word in it that I don’t like“. This could be a good point towards having word senses attached to ontoelements, rather than synsets. But actually it is not, as much as reducing commitment always reduces constraints and problems, but also offers less solutions and opportunities. The paper [1] (and suppose much more literature before that :-D ) is clear on the fact that true synonyms may never exist, and the concept of synonymy is dependent on the context, still the WordNet ontology (as all ontologies do) provides a discretization of a world model, where the “world” is the “generic use of language”, which in most of the cases will work, but may fail where this discretization is not correctly representing a given shade of meaning (i.e. there is no wordnet sense for a word, perfectly fitting the right concept we want to express in our ontology, and thus its lexicalization).
But the truth is always the same in all cases of commitment: you can decide to re-use what you have as much as you like, and get the benefits deriving from the (shareable!) work of others up to a reasonable extent. If nothing in wordnet fits a specific ontoelement of yours, then put a blank node as LexicalConcept in the 3-entity-pattern, and go along in customizing your specific lexical characterization, while still keeping the rest (probably 99% of your ontology) happily WordNet-decorated.
 
To recap until now, the moral behind all of that (beyond triples, names etc…), is that WordNet is a linguistic resource, and by treating it as a generic conceptualization, we could miss the opportunity of using it for what it is.
 
Now, a final remark, because John (and I want to assure here Piek as well about his concerns :-) ) is totally right in his email, when he says: 
[John]
“Firstly, I think an important point here is that WordNet does in fact have senses as a concept distinct from Synsets and Words“. 
 
[Armando]
Surely this is the best argumentation on supporting the fact that these senses shouldn’t go away if we want to fully support WordNet.
By first, something I already expressed in my previous email: it may not be our priority to have all of WordNet inside OntoLex; we could cover 85% of WordNet model through OntoLex, and then have some specific parts of it not under the cap of our generic vocabulary (but still WordNet having its own RDF modeling scheme, 100% covnering wordnet, and 85% mapped to ontolex). I’m not saying we shouldn’t cover it, I just want to stress that the focus in the discussions before is not on covering 100% WordNet, but on how to fit it inside our model, and how to use it to enrich an ontology. Given this, let’s assume that we want to cover it 100% and let’s go ahead. 
 
All of us know that, when representing a domain through a given model, we may have to represent things we perceive as different, through identical constructs. When we are in RDF, sometimes we have to reify relationships into entities. Conversely, in relational modelling, all entities and relations from an ER model become relations (e.g. then tables in a DB). So, surely fact is that in the traditional WordNet index-file-based DB, there is a sense index file, and that there, bindings between Synsets and Words are expressed, because sometimes they need to be cited explicitly as first-class citizens. 
Let us consider the case of lexical relations (which, namely, cover relations between words). In WordNet, (since it was born merely “to be a theory of the Word Meaning box”, [1, pag. 5]) there are no purely lexical relations, and its lexical rels are actually stated between senses of a word, that is between word-synset pairs. For instance, in common speaking, we say that rise/fall are antonyms, but surely we are not addressing the US expression of “autumn” as opposed to “rise”: well, WordNet accounts for that, by specifying that two words are antonyms only when considering some of their intended senses.
Another example is the tag count, again in wordnet, telling how many times a specific word with a particular sense (tagged with a given synset) has appeared in a corpus (e.g. SemCor). Or the sense ordering already mentioned in other emails.
But is it anymore important than just an escamotage for adding additional statistical data, put some ordering, or better qualify lex relations? I think not. Synset and Words are the VIPs. Sense (in wordnet) is just the reification of the <Word, Synset> combo.
 
So, this is the notion of “sense” in WordNet: a glueing object relating a Word to a Unit of Meaning (a lexical concept). The lexical concept is “hinted” by the index (through the synset code) and linguistically expressed by means of a Synset’s lexical extension: its words. A Word has a Sense in that it points to a given Unit of Meaning.  The Sense, as such, cannot have any definition, as it only reifies the link between Words and UnitOfMeanings. Here I think is where the confusion has happened until now, as sometimes we had this more elaborated concept of Sense as a unit of meaning, while in WordNet we needed a mere reification of a relation.
 
Thus on the one side, I would be tempted to say that “sense” is a relationship, and as well, for being short, the property: ontolex:sense pretty well holds it, though not for linking to a reified LexicalSense, but for linking to a Unit of Meaning/LexicalConcept. On the other side, fact is that we may need (see above examples) a reification of that sense relationship. We have to keep the two things distinct. Here I would introduce ontolex:Sense exactly as this, not as a UnitOfMeaning, but as a reification of the relation between a Word and Unit of Meaning.
 
So far so good, it seems  I could have widen the path from plain literals to ontoelements instead of shortening it, but actually, if properly planned, we could have very useful properties, which can be exploded into reified objects if and where appropriate. And, most of all, we would keep Linguistic Resources as something usable to enrich ontologies, and not as further ontologies to be mapped.
 
MODEL PROPOSAL:
 
I would propose then the following model:
 
NOTES: 
I left out all the characterization of LexicalEntries, which is obviously important, but separate from this discussion. 
For ease of reading, I’m using  the empty prefix instead of :ontolex here.
 
CLASSES:
:LexicalConcept (or Unit of Meaning, but I’ll use LexicalConcept from now on)
:Sense
:LexicalEntry
 
PROPERTIES:
:sense                  domain: LexicalEntry                     range: LexicalConcept  (note the difference here)
:reference          domain: LexicalConcept               range: non-specified, expect however to “land” on ontoelements.
 
:lexEntry             range: LexicalEntry         merely a construct for the role of LexicalEntries in reifiedRelations, such as :Sense
:lexConcept       range: LexicalConcept   merely a construct for the role of LexicalConcepts in reifiedRelations, such as :Sense
 
 
A :Sense (capital letter) is the reification of the :sense property. Being binary in involving LexicalEntries with their intended meaning (LexicalConcept), ontolex:sense plays well in most of the cases, but, if we need a reification, we may have the following rule:
 
:Sense(y)                            :lexEntry                             :LexicalEntry(x)
:Sense(y)                            :lexConcept                       :LexicalConcept(z)
------------------- --->
:LexicalEntry(x)                :sense                                  :LexicalConcept(z)
 
 
Now, our 3-entity-pattern is, as I said initially:
 
Instof(ontolex:LexicalEntry) –prop(ontolex:sense)–> Instof (ontolex:LexicalConcept) –prop(ontolex:reference)–> An Ontoelement
 
Where InstOf(x) means: “an instance of x”
 
Now, WordNet. Given that:
 
Wordnet:Synset              rdfs:subClassOf               :LexicalConcept
 
We may express things such as:
 
wordnet:syn_n_08225481          ontolex:reference          <http://dbpedia.org/ontology/team> ;
 
thus bringing all of the LexicalEntries already defined in WordNet as synonyms in wordnet:syn_n_08225481, as valid LexicalEntries describing the ontology element dbpedia:team.
 
By no means it holds instead that:
Wordnet:Sense               rdfs:subClassOf               :LexicalConcept
As the former includes constructs made-of elements from the latter.
 
Ah, WordNet would have thus this reified senses, but still a direct connection of the form:
instOf(:LexicalEntry)                      :sense                  instOF(wordnet:Synset)
is possible and is hence welcome
 
As you may see:
 
1)      I preserved the possibility to reify Senses (necessary in WordNet), but separated this Sense reification from the LexicalConcept (or Unit of Meaning) present in the current model. 
2)      I allowed for these LexicalConcepts to be used as elements-in-the-middle of our 3-entities-pattern
 
The sense reification is very important in WordNet (as it may be in other resources), to keep track of very specific things such as word ordering, tag counting, or lexical relations, but while all of these have a very important role in the lexical resource, they are not to the extent of a ontolex binding. The :sense binary relation is more than enough in that context.
Once more, there cannot be any further “semantic” characterization of :Sense. An instance of :Sense cannot have a description, as the description pertains to the LexicalConcept. :Sense, in short, is just an escamotage in RDF to further characterize word-synset pairs with additional data.
 
Really sorry for the…yes..erm… quite long email :-D
 
Cheers,
 
Armando
 
P.S: As said, names might be improved (someone could insist that the pointer to a WordNet synset IS de facto a reference), but I would stress not to let terminology affect our modeling, and instead try later to find the best way to name things if we agree on them (rem tenet…verba sequentur). My only concern is that I was definitely feeling something was not working with the previous modeling, and think this “structure” much better renders our needs and properly exploit linguistic resources in the context of enriching conceptual knowledge.
 
[1] Introduction to WordNet: An On-line Lexical Database George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller
http://wordnetcode.princeton.edu/5papers.pdf 
 


<wordnet-screenshot.png><OntoLexModels.png>

Received on Friday, 19 April 2013 02:33:15 UTC