Re: WordNet modelling in Lemon and SKOS from John McCrae on 2013-04-18 (public-ontolex@w3.org from April 2013)

From: John McCrae <jmccrae@cit-ec.uni-bielefeld.de>
Date: Thu, 18 Apr 2013 16:42:31 +0200
To: Armando Stellato <stellato@info.uniroma2.it>
Cc: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>, public-ontolex <public-ontolex@w3.org>
Message-ID: <CAC5njqoFrCNWaHrCWU8wdaKi6GkfAR1AX5+tHZAzOLw-6Jna5Q@mail.gmail.com>
Hi Armando, all,

I will try to synthesize a few other emails into this reply.

Firstly, I agree with much of what of Armando says. Although lexical senses
may be a reification of the <Word,Synset> combo as Armando says, I feel
this understates the importance of their role. In fact, from my
understanding lexical senses constitute an extension of words used with a
given meaning, by the same logic that a lexical entry (lexeme) consists of
an extension of words used in various inflected form. By the converse it
could be argued that the lexeme is therefore just a reification of the
<Form,Concept> pair (in fact this approximately what a SKOS-XL label is).
The key aspect is that is it useful in at least a significant percentage of
language resources, in this case, the use of lexical sense as the
annotation point for contexts (register, geographical usage), conditions
(lexical selection restrictions) and examples (as in WordNet, see
screenshot), make it IMHO a clearly vital part of the model.

When defining *lemon, *we tried to be partly agnostic about the format of
the ontology... we assumed it would be OWL, but didn't rule out the case of
linking to F-Logic, FOL, etc. From this point-of-view it is not
unreasonable  to consider linking to a SKOS concept hierarchy as an
informal ontology.

Much of the issue in this thread concerns what happens if we then want to
link this synset/concept hierarchy to a (formal) ontology. In the following
document they propose two options:

http://www.w3.org/2006/07/SWD/SKOS/skos-and-owl/master.html

They propose "overlay" and "transform" options. I suspect most members of
this list would reject the overlay option, so looking at the transform
option we see a model using *lemon, *OWL and SKOS (first part of attached
image), which uses the (unfortunately) hypothetical skos:it property to
link between the concept (synset) and the ontology entity.

In a previous email today I proposed a modelling based on Aldo's
semiotics.owl ontology (based on the understanding the lexical senses are
expressions, synsets are meaning and ontology entities are references). As
we can see this is structurally identical.

Finally, I also looked at Armando's proposal, and it also seems very
similar in structure. From my opinion it should be possible to move the
domain of Armando's sense link to the Sense class* and this would leave us
agreeing in the structure if not the names of the labels!

Regards,
John

* Of course, if we take into account Philipp's proposed shortcut link (see
http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Lexicon-Ontology-Mapping)
between
Lexical Entries and Ontology Entites, then this link would simply be the
shortcut.


On Wed, Apr 17, 2013 at 9:28 PM, Armando Stellato <stellato@info.uniroma2.it
> wrote:

> Hi again,****
>
> ** **
>
> First of all,  this is a reply to all three emails from Philipp, John and
> Aldo (plus something more from other emails). Since the topic is the same,
> I wrote one single reply, as there are parts of their email in common.
> Also, a small legenda, for being shorter later in the argumentation:****
>
> ** **
>
> Ontoelement(s): those elements of an ontology which need to be referenced
> through lexical information, that is, the objects of triples with
> ontolex:reference as their predicate. Note here that there is some abuse of
> notation: this “target ontology” could actually be a skos concept scheme
> and not an owl:ontology. We do not assign any Class here, as these element
> could be properties, individuals, classes or concepts****
>
> 3-entity-pattern: that LexicalEntry -> LexicalSense -> OntoElement
> structure we (more or less) agreed on.****
>
> ** **
>
> Ah, one note…this is not only an interminably long discussion, I propose a
> model at the end :-D****
>
> ** **
>
> I put here below names of people before any section, so that it is clear
> who said what and whom I’m replying to:****
>
> ** **
>
> [Philipp]****
>
> I agree that in some sense the three-entity path seems an overkill for
> modelling WordNet. But I think that our goal should be to design a model to
> works for all cases and not tune the model to the particular case of
> WordNet. So I would prefer to use the same modelling (i.e. the three-entity
> path) across all specific resources.
>
> ****
>
> [Armando]****
>
> Absolutely agree on our mandate to have something homogeneous and not
> hard-patched to some specific necessity. My proposed modelling for WordNet
> is in fact not in the direction of sprouting exceptions from our model to
> cover WordNet, but is actually (obviously, this is my opinion and I may be
> wrong) a more trustworthy replication of its structure, which I think is
> elegantly compatible with our model and even better matches it. Hence more,
> it fosters a better integration of WordNet when used to enrich an ontology.
> ****
>
> However, my perspective is not totally incompatible with some modelling
> exigencies (see later my reply to John’s observations), and as you will
> see, some linking can be drawn up.****
>
> ** **
>
> But, to argument better (at least, I hope), I have to take a step back
> (and sorry, I’ll be going through things that all of you know very well,
> but still I need to mention them for the argumentation).****
>
> ** **
>
> In WordNet we have words (terms, whatever..), and these words are bound
> into collections called synonymy sets. To cite the most popular paper [1]
> about WordNet, “…synonym sets (synsets) do not explain what the concepts
> are; they merely signify that the concepts exist”. So, ok, synonym sets are
> just “language extensional hints” to a concepts. We don’t know
> intensionally what that concept is, but we understand there is and we know
> linguistically how to refer to it. From a sentence in the same paper, just
> before the aforementioned one, we read: “The synonym sets, {board, plank}
> and {board, committee} can serve as unambiguous designators of these two
> meanings of board”. So, meaning of boards, under an interpretative process,
> are designated by synsets.****
>
> From the very first rows (the abstract) of that same publication, we read:
> ****
>
> “English nouns, verbs, and adjectives are organized into synonym sets,
> each representing one underlying lexical concept”.****
>
> ** **
>
> Ok, perfect, personally, I’ve found what I would suggest for that
> element-in-the-middle in the 3-elements-path. It is called LexicalConcept,
> and fits dramatically well (even terminologically) as a subclass of
> skos:Concept. As I said many times, I personally didn’t like LexicalSense
> as, maybe exactly biased by my knowledge of WordNet, and by a bit of common
> sense, I would have used the word “sense”, only to represent the
> relationship which holds between a LexicalEntry and a LexicalConcept. That
> is to say: a LexicalEntry may have many senses, and each of them is
> represented through a pointer – through the relation: “ontolex:sense” – to
> a LexicalConcept, which accidentally in WordNet is a synset (not my words,
> I’m citing their literature).****
>
> ** **
>
> Thus, recapping, in my view the thing is simple. I try to recap it as Aldo
> did in his email, but on my modelling perspective; therefore, to me the
> 3-entities-pattern (and gluing props) in our language would be:****
>
> ** **
>
> Class(ontolex:LexicalEntry) –prop(ontolex:sense)–>
> Class(ontolex:LexicalConcept) –prop(ontolex:reference)–> An Ontoelement***
> *
>
> ** **
>
> Until now, by purely graph-matching it with what has already been said, it
> seems I just don’t like the LexicalSense name, and replaced it with
> LexicalConcept, but there’s something different exactly when we consider a
> case like WordNet.****
>
> Let’s take these two other triples:****
>
> ** **
>
> wordnet:Synset rdfs:subClassOf ontolex:LexicalConcept****
>
> wordnet:*syn_v_00076153* rdf:type wordnet:Synset****
>
> ** **
>
> thus, here we have just two renamings:****
>
> **-          **a synset instance renaming: very personally, I think the
> synset code is the most “neutral way” of calling a synset, not biased by
> one of the terms which are part of it, which always gave me an headache;
> think this is the same thing Piek was referring to when talking about the
> choice of word-sensenumber pairs as URIs for synsets in the existing RDF
> version of WordNet****
>
> **-          **my LexicalConcept class instead of LexicalSense****
>
> but, apart from them, I took those two triples exactly as they are from
> Aldo’s example.****
>
> ** **
>
> Now, the focus of my opposition to the original WordNet example (or
> better, of some implications of it which I heard as confirmed in the
> emails), is that I see this class LexicalConcept as exactly the “vague
> lexical concept” – of which we precisely know a lexical extension – which
> can be put in between LexicalEntries and ontoelements in the
> 3-entities-pattern.****
>
> It is exactly, for instance, the bnode we put in the example in:
> http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Lexicon-Ontology-Mapping#Examples_using_DBpedia when
> we write:****
>
> ** **
>
> :team a ontolex:LexicalEntry ;****
>
>   ontolex:canonicalForm [ontolex:writtenRep "team"@en ; ] ;****
>
>   ontolex:sense [ontolex:reference <http://dbpedia.org/ontology/team> ;***
> *
>
> ** **
>
> to link the :team LexicalEntry to the dbpedia:team resource.****
>
> ** **
>
> Only…if we are using WordNet, someone has already prepared a set of these
> LexicalConcepts (seasoned with words!) for us, gave identifiers to them (so
> no bnodes necessary), and a general class for them, calling it Synset :-)*
> ***
>
> This is really the central part of what I’m saying.****
>
> ** **
>
> Thus, a very basic (but still compliant) modelling can be:****
>
> ** **
>
> wordnet:syn_n_08225481 ontolex:reference <http://dbpedia.org/ontology/team
> > ;****
>
> ** **
>
> and we get for free all the LexicalEntries already attached to WordNet,
> and modelled according to our vocabulary. Obviously, some other work can
> further enrich the lexical description of a WordNet synset (which in
> wordnet is just a set of words) thanks to our more fine grained vocabulary
> allowing for richer characterization of Lexical Entries. Still at least
> with one row above, we get a lot for free thanks to the mere existence of
> WordNet.****
>
> ** **
>
> [Philipp]****
>
> Assuming that WordNet contains a conceptualization, each synset indeed
> represents a skos:Concept (a unit of thought) and in that sense it seems
> reasonable to see a Synset as a reference.****
>
> ** **
>
> [Armando]****
>
> Agree on the skos:Concept part, not on the rest. WordNet is a lexical
> database. Its domain (the set of its linguistic concepts called synsets) is
> still linguistic, and the concepts of WordNet are thus IMHO these
> LexicalConcepts I’m advocating. If you commit somehow to WordNet, then you
> could (you should, in my advice) commit to (and take benefits from) using
> these synsets as the element-in-the-middle of our 3-entities-pattern.****
>
> I’m trying to assess WordNet in the right place of our wider
> onto-linguistic modelling, and I see it as the linguistic part which needs
> to be attached to the conceptual part. I wouldn’t like to see WordNet as a
> domain (world domain) concept scheme with attached labels that can be
> potentially mapped to our ontoelements. Obviously, the use of skos:Concept
> may be misleading in its name (as “concept” could induce in the thought
> that - in the onto-lex composition - it is the “onto” part), but I’m
> stressing that this extension of skos:Concept should be our
> ontolex:LexicalConcept, and that this ontolex:LexicalConcept itself is the
> right cap (superclass) for wordnet:Synset when considering WordNet as a
> specific instance of a Ontolex-modelable lexical resource. Finally, once
> more, this implies that Synsets should sit in between LexicalEntries and
> ontoelements in our 3-Entities-Pattern.****
>
> ** **
>
> I try now to explain the contra for the example currently in the wiki.
> With the previous modelling, we get almost nothing back: we would have this
> “general world ontology” called WordNet, which has its lexical entries
> (mediated through the Sense entity), and we have two distinct universes of
> possible actions:****
>
> **1)      **we could map the resources of our domain
> ontology/conceptscheme to the synsets of WordNet, much the same way we map
> two general domain ontologies or concept schemes.****
>
> **2)      **we could relate specific wordsenses, such as:
> wordsense-vomit-verb-1, to resources in our ontology.
>
> ****
>
> But pay attention, in what I propose we could link a synset (*
> syn_v_00076153*), through ontolex:reference, directly to ontoelements and
> use it - coherently with our model - to have all of that synsets
> lexicalentries bound to the intended ontoelement. In the current model
> instead, by using WordNet senses, we should link each sense of each word to
> the ontoelements****
>
> ** **
>
> Thus we should state:****
>
> wordsense-vomit-verb-1  ontolex:reference    myont:vomit****
>
> wordsense-cat-verb-2       ontolex:reference   myont:vomit****
>
> ** **
>
> but…is it not painful? We already had the synset as a common umbrella! Oh
> yes, surely we could decide some entailment, for which if I link
> (somehow..how? through skos:exactMatch?) a synset to an element of my
> ontologies, then all of its related wordsenses (that is, the set of senses
> for which certain words are bound to that synset) are bound to the
> ontoelements. But how to state this entailment in the general ontolex
> vocabulary, since Synsets are out of it? (and in fact the wiki example does
> not hint at any general definition of wordnet:Synset under some ontolex
> umbrella, being it only the last resource to be pointed by
> ontolex:reference, much like an ontoelement from any other ontology).****
>
> ** **
>
> With a slight difference approach from Philipp and John, I see
> interestingly that Aldo proposed both Synset and WordSense as subclasses of
> ontolex:LexicalSense. This would mean that Aldo would actually allow to use
> synsets in the middle of our 3-elements-path
>
> ****
>
>                 wordnet:WordSense rdfs:subClassOf ontolex:LexicalSense****
>
>                 wordnet:Synset rdfs:subClassOf ontolex:LexicalSense****
>
> ** **
>
> this seems discordant from what Philipp and John say. While I obviously
> agree with the second axiom (it’s basically the core of what I’m saying),
> personally I can’t see wordnet:WordSense as well as a subclass of
> ontolex:LexicalSense, and, actually, can’t think how the two things
> (wordnet:WordSense and wordnet:Synset), which are solidly distinct, can be
> subclasses of the same class in any possible theory.****
>
> ** **
>
> So (if I’m correct), in the case of Philipp and John, it seems Synset is
> left away from any convenient reuse, while in the case of Aldo, I’ve this
> big problem with the double subclassing of both Synset and WordSense under
> LexicalSense. You may not agree with me, but still it seems something is
> missing.
>
> ****
>
> I was then trying to do the devil’s advocate and argument against myself:
> “what if I want to attach a given set of words to one of my ontoelements,
> but there is no synset in wordnet which rightly embraces it?, that is, for
> each synset I would consider, there is a word in it that I don’t like“.
> This could be a good point towards having word senses attached to
> ontoelements, rather than synsets. But actually it is not, as much as
> reducing commitment always reduces constraints and problems, but also
> offers less solutions and opportunities. The paper [1] (and suppose much
> more literature before that :-D ) is clear on the fact that true synonyms
> may never exist, and the concept of synonymy is dependent on the context,
> still the WordNet ontology (as all ontologies do) provides a discretization
> of a world model, where the “world” is the “generic use of language”, which
> in most of the cases will work, but may fail where this discretization is
> not correctly representing a given shade of meaning (i.e. there is no
> wordnet sense for a word, perfectly fitting the right concept we want to
> express in our ontology, and thus its lexicalization).
> But the truth is always the same in all cases of commitment: you can
> decide to re-use what you have as much as you like, and get the benefits
> deriving from the (shareable!) work of others up to a reasonable extent. If
> nothing in wordnet fits a specific ontoelement of yours, then put a blank
> node as LexicalConcept in the 3-entity-pattern, and go along in customizing
> your specific lexical characterization, while still keeping the rest
> (probably 99% of your ontology) happily WordNet-decorated.****
>
> ** **
>
> To recap until now, the moral behind all of that (beyond triples, names
> etc…), is that WordNet is a linguistic resource, and by treating it as a
> generic conceptualization, we could miss the opportunity of using it for
> what it is.****
>
> ** **
>
> Now, a final remark, because John (and I want to assure here Piek as well
> about his concerns :-) ) is totally right in his email, when he says: ****
>
> [John]****
>
> “Firstly, I think an important point here is that WordNet does in fact
> have senses as a concept distinct from Synsets and Words“. ****
>
> ** **
>
> [Armando]****
>
> Surely this is the best argumentation on supporting the fact that these
> senses shouldn’t go away if we want to fully support WordNet.****
>
> By first, something I already expressed in my previous email: it may not
> be our priority to have all of WordNet inside OntoLex; we could cover 85%
> of WordNet model through OntoLex, and then have some specific parts of it
> not under the cap of our generic vocabulary (but still WordNet having its
> own RDF modeling scheme, 100% covnering wordnet, and 85% mapped to
> ontolex). I’m not saying we shouldn’t cover it, I just want to stress that
> the focus in the discussions before is not on covering 100% WordNet, but on
> how to fit it inside our model, and how to use it to enrich an ontology.
> Given this, let’s assume that we want to cover it 100% and let’s go ahead.
> ****
>
> ** **
>
> All of us know that, when representing a domain through a given model, we
> may have to represent things we perceive as different, through identical
> constructs. When we are in RDF, sometimes we have to reify relationships
> into entities. Conversely, in relational modelling, all entities and
> relations from an ER model become relations (e.g. then tables in a DB). So,
> surely fact is that in the traditional WordNet index-file-based DB, there
> is a sense index file, and that there, bindings between Synsets and Words
> are expressed, because sometimes they need to be cited explicitly as
> first-class citizens. ****
>
> Let us consider the case of lexical relations (which, namely, cover
> relations between words). In WordNet, (since it was born merely “to be a
> theory of the Word Meaning box”, [1, pag. 5]) there are no purely lexical
> relations, and its lexical rels are actually stated between senses of a
> word, that is between word-synset pairs. For instance, in common speaking,
> we say that rise/fall are antonyms, but surely we are not addressing the US
> expression of “autumn” as opposed to “rise”: well, WordNet accounts for
> that, by specifying that two words are antonyms only when considering some
> of their intended senses.****
>
> Another example is the tag count, again in wordnet, telling how many times
> a specific word with a particular sense (tagged with a given synset) has
> appeared in a corpus (e.g. SemCor). Or the sense ordering already mentioned
> in other emails.****
>
> But is it anymore important than just an escamotage for adding additional
> statistical data, put some ordering, or better qualify lex relations? I
> think not. Synset and Words are the VIPs. Sense (in wordnet) is just the
> reification of the <Word, Synset> combo.****
>
> ** **
>
> So, this is the notion of “sense” in WordNet: a glueing object relating a
> Word to a Unit of Meaning (a lexical concept). The lexical concept is
> “hinted” by the index (through the synset code) and linguistically
> expressed by means of a Synset’s lexical extension: its words. A Word has a
> Sense in that it points to a given Unit of Meaning.  The Sense, as such,
> cannot have any definition, as it only reifies the link between Words and
> UnitOfMeanings. Here I think is where the confusion has happened until now,
> as sometimes we had this more elaborated concept of Sense as a unit of
> meaning, while in WordNet we needed a mere reification of a relation.****
>
> ** **
>
> Thus on the one side, I would be tempted to say that “sense” is a
> relationship, and as well, for being short, the property: ontolex:sense
> pretty well holds it, though not for linking to a reified LexicalSense, but
> for linking to a Unit of Meaning/LexicalConcept. On the other side, fact is
> that we may need (see above examples) a reification of that sense
> relationship. We have to keep the two things distinct. Here I would
> introduce ontolex:Sense exactly as this, not as a UnitOfMeaning, but as a
> reification of the relation between a Word and Unit of Meaning.****
>
> ** **
>
> So far so good, it seems  I could have widen the path from plain literals
> to ontoelements instead of shortening it, but actually, if properly
> planned, we could have very useful properties, which can be exploded into
> reified objects if and where appropriate. And, most of all, we would keep
> Linguistic Resources as something usable to enrich ontologies, and not as
> further ontologies to be mapped.****
>
> ** **
>
> MODEL PROPOSAL:****
>
> ** **
>
> I would propose then the following model:****
>
> ** **
>
> NOTES: ****
>
> *I left out all the characterization of LexicalEntries, which is
> obviously important, but separate from this discussion. *
>
> *For ease of reading, I’m using  the empty prefix instead of :ontolex
> here.*
>
> ** **
>
> CLASSES:****
>
> :LexicalConcept (or Unit of Meaning, but I’ll use LexicalConcept from now
> on)****
>
> :Sense****
>
> :LexicalEntry****
>
> ** **
>
> PROPERTIES:****
>
> :sense                  domain: LexicalEntry                     range:
> LexicalConcept  (note the difference here)****
>
> :reference          domain: LexicalConcept               range:
> non-specified, expect however to “land” on ontoelements.****
>
> ** **
>
> :lexEntry             range: LexicalEntry         *merely a construct for
> the role of LexicalEntries in reifiedRelations, such as :Sense*
>
> :lexConcept       range: LexicalConcept   *merely a construct for the
> role of LexicalConcepts in reifiedRelations, such as :Sense*
>
> ** **
>
> ** **
>
> A :Sense (capital letter) is the reification of the :sense property. Being
> binary in involving LexicalEntries with their intended meaning
> (LexicalConcept), ontolex:sense plays well in most of the cases, but, if we
> need a reification, we may have the following rule:****
>
> ** **
>
> :Sense(y)                            :lexEntry
> :LexicalEntry(x)****
>
> :Sense(y)                            :lexConcept
> :LexicalConcept(z)****
>
> ------------------- --->****
>
> :LexicalEntry(x)                :sense
> :LexicalConcept(z)****
>
> ** **
>
> ** **
>
> Now, our 3-entity-pattern is, as I said initially:****
>
> ** **
>
> Instof(ontolex:LexicalEntry) –prop(ontolex:sense)–> Instof
> (ontolex:LexicalConcept) –prop(ontolex:reference)–> An Ontoelement****
>
> ** **
>
> Where InstOf(x) means: “an instance of x”****
>
> ** **
>
> Now, WordNet. Given that:****
>
> ** **
>
> Wordnet:Synset              rdfs:subClassOf               :LexicalConcept*
> ***
>
> ** **
>
> We may express things such as:****
>
> ** **
>
> wordnet:syn_n_08225481          ontolex:reference          <
> http://dbpedia.org/ontology/team> ;****
>
> ** **
>
> thus bringing all of the LexicalEntries already defined in WordNet as
> synonyms in wordnet:syn_n_08225481, as valid LexicalEntries describing
> the ontology element dbpedia:team.****
>
> ** **
>
> By no means it holds instead that:****
>
> Wordnet:Sense               rdfs:subClassOf               :LexicalConcept*
> ***
>
> As the former includes constructs made-of elements from the latter.****
>
> ** **
>
> Ah, WordNet would have thus this reified senses, but still a direct
> connection of the form:****
>
> instOf(:LexicalEntry)                      :sense
> instOF(wordnet:Synset)****
>
> is possible and is hence welcome****
>
> ** **
>
> As you may see:****
>
> ** **
>
> **1)      **I preserved the possibility to reify Senses (necessary in
> WordNet), but separated this Sense reification from the LexicalConcept (or
> Unit of Meaning) present in the current model. ****
>
> **2)      **I allowed for these LexicalConcepts to be used as
> elements-in-the-middle of our 3-entities-pattern****
>
> ** **
>
> The sense reification is very important in WordNet (as it may be in other
> resources), to keep track of very specific things such as word ordering,
> tag counting, or lexical relations, but while all of these have a very
> important role in the lexical resource, they are not to the extent of a
> ontolex binding. The :sense binary relation is more than enough in that
> context.****
>
> Once more, there cannot be any further “semantic” characterization of
> :Sense. An instance of :Sense cannot have a description, as the description
> pertains to the LexicalConcept. :Sense, in short, is just an escamotage in
> RDF to further characterize word-synset pairs with additional data.****
>
> ** **
>
> Really sorry for the…yes..erm… quite long email :-D****
>
> ** **
>
> Cheers,****
>
> ** **
>
> Armando****
>
> ** **
>
> P.S: As said, names might be improved (someone could insist that the
> pointer to a WordNet synset IS de facto a reference), but I would stress
> not to let terminology affect our modeling, and instead try later to find
> the best way to name things if we agree on them (rem tenet…verba
> sequentur). My only concern is that I was definitely feeling something was
> not working with the previous modeling, and think this “structure” much
> better renders our needs and properly exploit linguistic resources in the
> context of enriching conceptual knowledge.****
>
> ** **
>
> [1] Introduction to WordNet: An On-line Lexical Database George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller
> http://wordnetcode.princeton.edu/5papers.pdf****
>
> ** **
>
> ** **
>
Attachments

image/png attachment: wordnet-screenshot.png
image/png attachment: OntoLexModels.png
Received on Thursday, 18 April 2013 14:43:06 UTC