RE: WordNet modelling in Lemon and SKOS

Hi again,

 

First of all,  this is a reply to all three emails from Philipp, John and Aldo (plus something more from other emails). Since the topic is the same, I wrote one single reply, as there are parts of their email in common. Also, a small legenda, for being shorter later in the argumentation:

 

Ontoelement(s): those elements of an ontology which need to be referenced through lexical information, that is, the objects of triples with ontolex:reference as their predicate. Note here that there is some abuse of notation: this “target ontology” could actually be a skos concept scheme and not an owl:ontology. We do not assign any Class here, as these element could be properties, individuals, classes or concepts

3-entity-pattern: that LexicalEntry -> LexicalSense -> OntoElement structure we (more or less) agreed on.

 

Ah, one note…this is not only an interminably long discussion, I propose a model at the end :-D

 

I put here below names of people before any section, so that it is clear who said what and whom I’m replying to:

 

[Philipp]

I agree that in some sense the three-entity path seems an overkill for modelling WordNet. But I think that our goal should be to design a model to works for all cases and not tune the model to the particular case of WordNet. So I would prefer to use the same modelling (i.e. the three-entity path) across all specific resources.



[Armando]

Absolutely agree on our mandate to have something homogeneous and not hard-patched to some specific necessity. My proposed modelling for WordNet is in fact not in the direction of sprouting exceptions from our model to cover WordNet, but is actually (obviously, this is my opinion and I may be wrong) a more trustworthy replication of its structure, which I think is elegantly compatible with our model and even better matches it. Hence more, it fosters a better integration of WordNet when used to enrich an ontology.

However, my perspective is not totally incompatible with some modelling exigencies (see later my reply to John’s observations), and as you will see, some linking can be drawn up.

 

But, to argument better (at least, I hope), I have to take a step back (and sorry, I’ll be going through things that all of you know very well, but still I need to mention them for the argumentation).

 

In WordNet we have words (terms, whatever..), and these words are bound into collections called synonymy sets. To cite the most popular paper [1] about WordNet, “…synonym sets (synsets) do not explain what the concepts are; they merely signify that the concepts exist”. So, ok, synonym sets are just “language extensional hints” to a concepts. We don’t know intensionally what that concept is, but we understand there is and we know linguistically how to refer to it. From a sentence in the same paper, just before the aforementioned one, we read: “The synonym sets, {board, plank} and {board, committee} can serve as unambiguous designators of these two meanings of board”. So, meaning of boards, under an interpretative process, are designated by synsets.

>From the very first rows (the abstract) of that same publication, we read: 

“English nouns, verbs, and adjectives are organized into synonym sets, each representing one underlying lexical concept”.

 

Ok, perfect, personally, I’ve found what I would suggest for that element-in-the-middle in the 3-elements-path. It is called LexicalConcept, and fits dramatically well (even terminologically) as a subclass of skos:Concept. As I said many times, I personally didn’t like LexicalSense as, maybe exactly biased by my knowledge of WordNet, and by a bit of common sense, I would have used the word “sense”, only to represent the relationship which holds between a LexicalEntry and a LexicalConcept. That is to say: a LexicalEntry may have many senses, and each of them is represented through a pointer – through the relation: “ontolex:sense” – to a LexicalConcept, which accidentally in WordNet is a synset (not my words, I’m citing their literature).

 

Thus, recapping, in my view the thing is simple. I try to recap it as Aldo did in his email, but on my modelling perspective; therefore, to me the 3-entities-pattern (and gluing props) in our language would be:

 

Class(ontolex:LexicalEntry) –prop(ontolex:sense)–> Class(ontolex:LexicalConcept) –prop(ontolex:reference)–> An Ontoelement

 

Until now, by purely graph-matching it with what has already been said, it seems I just don’t like the LexicalSense name, and replaced it with LexicalConcept, but there’s something different exactly when we consider a case like WordNet.

Let’s take these two other triples:

 

wordnet:Synset rdfs:subClassOf ontolex:LexicalConcept

wordnet:syn_v_00076153 rdf:type wordnet:Synset

 

thus, here we have just two renamings:

-          a synset instance renaming: very personally, I think the synset code is the most “neutral way” of calling a synset, not biased by one of the terms which are part of it, which always gave me an headache; think this is the same thing Piek was referring to when talking about the choice of word-sensenumber pairs as URIs for synsets in the existing RDF version of WordNet

-          my LexicalConcept class instead of LexicalSense

but, apart from them, I took those two triples exactly as they are from Aldo’s example.

 

Now, the focus of my opposition to the original WordNet example (or better, of some implications of it which I heard as confirmed in the emails), is that I see this class LexicalConcept as exactly the “vague lexical concept” – of which we precisely know a lexical extension – which can be put in between LexicalEntries and ontoelements in the 3-entities-pattern.

It is exactly, for instance, the bnode we put in the example in:  <http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Lexicon-Ontology-Mapping#Examples_using_DBpedia> http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Lexicon-Ontology-Mapping#Examples_using_DBpedia when we write:

 

:team a ontolex:LexicalEntry ;

  ontolex:canonicalForm [ontolex:writtenRep "team"@en ; ] ;

  ontolex:sense [ontolex:reference <http://dbpedia.org/ontology/team> ;

 

to link the :team LexicalEntry to the dbpedia:team resource.

 

Only…if we are using WordNet, someone has already prepared a set of these LexicalConcepts (seasoned with words!) for us, gave identifiers to them (so no bnodes necessary), and a general class for them, calling it Synset :-)

This is really the central part of what I’m saying.

 

Thus, a very basic (but still compliant) modelling can be:

 

wordnet:syn_n_08225481 ontolex:reference <http://dbpedia.org/ontology/team> ;

 

and we get for free all the LexicalEntries already attached to WordNet, and modelled according to our vocabulary. Obviously, some other work can further enrich the lexical description of a WordNet synset (which in wordnet is just a set of words) thanks to our more fine grained vocabulary allowing for richer characterization of Lexical Entries. Still at least with one row above, we get a lot for free thanks to the mere existence of WordNet.

 

[Philipp]

Assuming that WordNet contains a conceptualization, each synset indeed represents a skos:Concept (a unit of thought) and in that sense it seems reasonable to see a Synset as a reference.

 

[Armando]

Agree on the skos:Concept part, not on the rest. WordNet is a lexical database. Its domain (the set of its linguistic concepts called synsets) is still linguistic, and the concepts of WordNet are thus IMHO these LexicalConcepts I’m advocating. If you commit somehow to WordNet, then you could (you should, in my advice) commit to (and take benefits from) using these synsets as the element-in-the-middle of our 3-entities-pattern.

I’m trying to assess WordNet in the right place of our wider onto-linguistic modelling, and I see it as the linguistic part which needs to be attached to the conceptual part. I wouldn’t like to see WordNet as a domain (world domain) concept scheme with attached labels that can be potentially mapped to our ontoelements. Obviously, the use of skos:Concept may be misleading in its name (as “concept” could induce in the thought that - in the onto-lex composition - it is the “onto” part), but I’m stressing that this extension of skos:Concept should be our ontolex:LexicalConcept, and that this ontolex:LexicalConcept itself is the right cap (superclass) for wordnet:Synset when considering WordNet as a specific instance of a Ontolex-modelable lexical resource. Finally, once more, this implies that Synsets should sit in between LexicalEntries and ontoelements in our 3-Entities-Pattern.

 

I try now to explain the contra for the example currently in the wiki. With the previous modelling, we get almost nothing back: we would have this “general world ontology” called WordNet, which has its lexical entries (mediated through the Sense entity), and we have two distinct universes of possible actions:

1)      we could map the resources of our domain ontology/conceptscheme to the synsets of WordNet, much the same way we map two general domain ontologies or concept schemes.

2)      we could relate specific wordsenses, such as: wordsense-vomit-verb-1, to resources in our ontology.



But pay attention, in what I propose we could link a synset (syn_v_00076153), through ontolex:reference, directly to ontoelements and use it - coherently with our model - to have all of that synsets lexicalentries bound to the intended ontoelement. In the current model instead, by using WordNet senses, we should link each sense of each word to the ontoelements

 

Thus we should state:

wordsense-vomit-verb-1  ontolex:reference    myont:vomit

wordsense-cat-verb-2       ontolex:reference   myont:vomit

 

but…is it not painful? We already had the synset as a common umbrella! Oh yes, surely we could decide some entailment, for which if I link (somehow..how? through skos:exactMatch?) a synset to an element of my ontologies, then all of its related wordsenses (that is, the set of senses for which certain words are bound to that synset) are bound to the ontoelements. But how to state this entailment in the general ontolex vocabulary, since Synsets are out of it? (and in fact the wiki example does not hint at any general definition of wordnet:Synset under some ontolex umbrella, being it only the last resource to be pointed by ontolex:reference, much like an ontoelement from any other ontology).

 

With a slight difference approach from Philipp and John, I see interestingly that Aldo proposed both Synset and WordSense as subclasses of ontolex:LexicalSense. This would mean that Aldo would actually allow to use synsets in the middle of our 3-elements-path



                wordnet:WordSense rdfs:subClassOf ontolex:LexicalSense

                wordnet:Synset rdfs:subClassOf ontolex:LexicalSense

 

this seems discordant from what Philipp and John say. While I obviously agree with the second axiom (it’s basically the core of what I’m saying), personally I can’t see wordnet:WordSense as well as a subclass of ontolex:LexicalSense, and, actually, can’t think how the two things (wordnet:WordSense and wordnet:Synset), which are solidly distinct, can be subclasses of the same class in any possible theory.

 

So (if I’m correct), in the case of Philipp and John, it seems Synset is left away from any convenient reuse, while in the case of Aldo, I’ve this big problem with the double subclassing of both Synset and WordSense under LexicalSense. You may not agree with me, but still it seems something is missing.



I was then trying to do the devil’s advocate and argument against myself: “what if I want to attach a given set of words to one of my ontoelements, but there is no synset in wordnet which rightly embraces it?, that is, for each synset I would consider, there is a word in it that I don’t like“. This could be a good point towards having word senses attached to ontoelements, rather than synsets. But actually it is not, as much as reducing commitment always reduces constraints and problems, but also offers less solutions and opportunities. The paper [1] (and suppose much more literature before that :-D ) is clear on the fact that true synonyms may never exist, and the concept of synonymy is dependent on the context, still the WordNet ontology (as all ontologies do) provides a discretization of a world model, where the “world” is the “generic use of language”, which in most of the cases will work, but may fail where this discretization is not correctly representing a given shade of meaning (i.e. there is no wordnet sense for a word, perfectly fitting the right concept we want to express in our ontology, and thus its lexicalization).
But the truth is always the same in all cases of commitment: you can decide to re-use what you have as much as you like, and get the benefits deriving from the (shareable!) work of others up to a reasonable extent. If nothing in wordnet fits a specific ontoelement of yours, then put a blank node as LexicalConcept in the 3-entity-pattern, and go along in customizing your specific lexical characterization, while still keeping the rest (probably 99% of your ontology) happily WordNet-decorated.

 

To recap until now, the moral behind all of that (beyond triples, names etc…), is that WordNet is a linguistic resource, and by treating it as a generic conceptualization, we could miss the opportunity of using it for what it is.

 

Now, a final remark, because John (and I want to assure here Piek as well about his concerns :-) ) is totally right in his email, when he says: 

[John]

“Firstly, I think an important point here is that WordNet does in fact have senses as a concept distinct from Synsets and Words“. 

 

[Armando]

Surely this is the best argumentation on supporting the fact that these senses shouldn’t go away if we want to fully support WordNet.

By first, something I already expressed in my previous email: it may not be our priority to have all of WordNet inside OntoLex; we could cover 85% of WordNet model through OntoLex, and then have some specific parts of it not under the cap of our generic vocabulary (but still WordNet having its own RDF modeling scheme, 100% covnering wordnet, and 85% mapped to ontolex). I’m not saying we shouldn’t cover it, I just want to stress that the focus in the discussions before is not on covering 100% WordNet, but on how to fit it inside our model, and how to use it to enrich an ontology. Given this, let’s assume that we want to cover it 100% and let’s go ahead. 

 

All of us know that, when representing a domain through a given model, we may have to represent things we perceive as different, through identical constructs. When we are in RDF, sometimes we have to reify relationships into entities. Conversely, in relational modelling, all entities and relations from an ER model become relations (e.g. then tables in a DB). So, surely fact is that in the traditional WordNet index-file-based DB, there is a sense index file, and that there, bindings between Synsets and Words are expressed, because sometimes they need to be cited explicitly as first-class citizens. 

Let us consider the case of lexical relations (which, namely, cover relations between words). In WordNet, (since it was born merely “to be a theory of the Word Meaning box”, [1, pag. 5]) there are no purely lexical relations, and its lexical rels are actually stated between senses of a word, that is between word-synset pairs. For instance, in common speaking, we say that rise/fall are antonyms, but surely we are not addressing the US expression of “autumn” as opposed to “rise”: well, WordNet accounts for that, by specifying that two words are antonyms only when considering some of their intended senses.

Another example is the tag count, again in wordnet, telling how many times a specific word with a particular sense (tagged with a given synset) has appeared in a corpus (e.g. SemCor). Or the sense ordering already mentioned in other emails.

But is it anymore important than just an escamotage for adding additional statistical data, put some ordering, or better qualify lex relations? I think not. Synset and Words are the VIPs. Sense (in wordnet) is just the reification of the <Word, Synset> combo.

 

So, this is the notion of “sense” in WordNet: a glueing object relating a Word to a Unit of Meaning (a lexical concept). The lexical concept is “hinted” by the index (through the synset code) and linguistically expressed by means of a Synset’s lexical extension: its words. A Word has a Sense in that it points to a given Unit of Meaning.  The Sense, as such, cannot have any definition, as it only reifies the link between Words and UnitOfMeanings. Here I think is where the confusion has happened until now, as sometimes we had this more elaborated concept of Sense as a unit of meaning, while in WordNet we needed a mere reification of a relation.

 

Thus on the one side, I would be tempted to say that “sense” is a relationship, and as well, for being short, the property: ontolex:sense pretty well holds it, though not for linking to a reified LexicalSense, but for linking to a Unit of Meaning/LexicalConcept. On the other side, fact is that we may need (see above examples) a reification of that sense relationship. We have to keep the two things distinct. Here I would introduce ontolex:Sense exactly as this, not as a UnitOfMeaning, but as a reification of the relation between a Word and Unit of Meaning.

 

So far so good, it seems  I could have widen the path from plain literals to ontoelements instead of shortening it, but actually, if properly planned, we could have very useful properties, which can be exploded into reified objects if and where appropriate. And, most of all, we would keep Linguistic Resources as something usable to enrich ontologies, and not as further ontologies to be mapped.

 

MODEL PROPOSAL:

 

I would propose then the following model:

 

NOTES: 

I left out all the characterization of LexicalEntries, which is obviously important, but separate from this discussion. 

For ease of reading, I’m using  the empty prefix instead of :ontolex here.

 

CLASSES:

:LexicalConcept (or Unit of Meaning, but I’ll use LexicalConcept from now on)

:Sense

:LexicalEntry

 

PROPERTIES:

:sense                  domain: LexicalEntry                     range: LexicalConcept  (note the difference here)

:reference          domain: LexicalConcept               range: non-specified, expect however to “land” on ontoelements.

 

:lexEntry             range: LexicalEntry         merely a construct for the role of LexicalEntries in reifiedRelations, such as :Sense

:lexConcept       range: LexicalConcept   merely a construct for the role of LexicalConcepts in reifiedRelations, such as :Sense

 

 

A :Sense (capital letter) is the reification of the :sense property. Being binary in involving LexicalEntries with their intended meaning (LexicalConcept), ontolex:sense plays well in most of the cases, but, if we need a reification, we may have the following rule:

 

:Sense(y)                            :lexEntry                             :LexicalEntry(x)

:Sense(y)                            :lexConcept                       :LexicalConcept(z)

------------------- --->

:LexicalEntry(x)                :sense                                  :LexicalConcept(z)

 

 

Now, our 3-entity-pattern is, as I said initially:

 

Instof(ontolex:LexicalEntry) –prop(ontolex:sense)–> Instof (ontolex:LexicalConcept) –prop(ontolex:reference)–> An Ontoelement

 

Where InstOf(x) means: “an instance of x”

 

Now, WordNet. Given that:

 

Wordnet:Synset              rdfs:subClassOf               :LexicalConcept

 

We may express things such as:

 

wordnet:syn_n_08225481          ontolex:reference          <http://dbpedia.org/ontology/team> ;

 

thus bringing all of the LexicalEntries already defined in WordNet as synonyms in wordnet:syn_n_08225481, as valid LexicalEntries describing the ontology element dbpedia:team.

 

By no means it holds instead that:

Wordnet:Sense               rdfs:subClassOf               :LexicalConcept

As the former includes constructs made-of elements from the latter.

 

Ah, WordNet would have thus this reified senses, but still a direct connection of the form:

instOf(:LexicalEntry)                      :sense                  instOF(wordnet:Synset)

is possible and is hence welcome

 

As you may see:

 

1)      I preserved the possibility to reify Senses (necessary in WordNet), but separated this Sense reification from the LexicalConcept (or Unit of Meaning) present in the current model. 

2)      I allowed for these LexicalConcepts to be used as elements-in-the-middle of our 3-entities-pattern

 

The sense reification is very important in WordNet (as it may be in other resources), to keep track of very specific things such as word ordering, tag counting, or lexical relations, but while all of these have a very important role in the lexical resource, they are not to the extent of a ontolex binding. The :sense binary relation is more than enough in that context.

Once more, there cannot be any further “semantic” characterization of :Sense. An instance of :Sense cannot have a description, as the description pertains to the LexicalConcept. :Sense, in short, is just an escamotage in RDF to further characterize word-synset pairs with additional data.

 

Really sorry for the…yes..erm… quite long email :-D

 

Cheers,

 

Armando

 

P.S: As said, names might be improved (someone could insist that the pointer to a WordNet synset IS de facto a reference), but I would stress not to let terminology affect our modeling, and instead try later to find the best way to name things if we agree on them (rem tenet…verba sequentur). My only concern is that I was definitely feeling something was not working with the previous modeling, and think this “structure” much better renders our needs and properly exploit linguistic resources in the context of enriching conceptual knowledge.

 

[1] Introduction to WordNet: An On-line Lexical Database George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller
http://wordnetcode.princeton.edu/5papers.pdf

 

 

Received on Wednesday, 17 April 2013 19:28:45 UTC