Re: order of senses from John McCrae on 2013-04-16 (public-ontolex@w3.org from April 2013)

From: John McCrae <jmccrae@cit-ec.uni-bielefeld.de>
Date: Tue, 16 Apr 2013 12:42:20 +0200
To: Piek Vossen <piek.vossen@vu.nl>
Cc: Armando Stellato <stellato@info.uniroma2.it>, Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>, public-ontolex <public-ontolex@w3.org>, Jacco van Ossenbruggen <Jacco.van.Ossenbruggen@cwi.nl>
Message-ID: <CAC5njqpV6a=gMFeqaJ+4QkG52AN57wPGw3-DeWvMrYdd48FBFg@mail.gmail.com>
Hi,

I agree, WordNet uses specific sense indexes, distinct from the synset
identifiers and I think we must therefore have a named URI for both the
synset and the sense itself.

Regards,
John


On Tue, Apr 16, 2013 at 12:38 PM, Piek Vossen <piek.vossen@vu.nl> wrote:

> Dear all,
>
> I have been silent for a while cause I am/was too busy to keep track of
> all this. However, I feel the need to jump in now. Perhaps you already
> discussed this and my comments are not of any use. Sorry for raising it and
> do not bother.
>
> We had many discussion in the GWA community about sense identifiers and
> synset identifiers. The consensus is that we need both. For ontologically
> minded people synset ids for concepts are enough. However, not only the
> order of the senses is important (it often reflects frequency) but there
> are also many relations in various wordnets that hold only between lexical
> units (sense of a word that belong to different synsets): derivational
> relations, metonymy, metaphor, specialization, generalization etc.. Another
> point is that in WSD approach people use sense-groups (possibly based on
> the previous relations). Sense-groups consist of sense identifiers rather
> than synset identifiers.
>
> In addition to the concept to concept relations, we thus need identifiers
> for sense relations. In the W3C RDF version of Wordnet, they made the
> mistake to use only sense-keys to identify concepts. I hope here, you are
> not making the reverse mistake to use only synset ids.
>
> best wishes
>
> Piek
>
>
> On Apr 16, 2013, at 11:57 AM, Armando Stellato wrote:
>
> That was what I thought too, and actually, this would “give more sense to
> LexicalSense” (sorry for the pun :-) ), as at least the reification of
> senses would allow for an easy modelling of their ordering, by simply
> attaching it as a property.****
> Still, I’m not convinced about the necessity of their existence, when
> modelling lexical resources. Or better, they are ok, but, in the case of
> WordNet, they are actually (IMHO) the synsets.****
> I will add more in the reply to Philipp, as there are further examples
> there to comment.****
> Cheers,****
> Armando****
> ** **
> *From:* johnmccrae@gmail.com [mailto:johnmccrae@gmail.com] *On Behalf Of *John
> McCrae
> *Sent:* martedì 16 aprile 2013 10.19
> *To:* Philipp Cimiano
> *Cc:* public-ontolex
> *Subject:* Re: order of senses****
> ** **
>
> You are quite right this is an important and explicit part of the WordNet
> data model and should be preserved
>
> I believe including a senseNumber data property would cover this. Here is
> the reference to the original WordNet documentation on this
> http://wordnet.princeton.edu/wordnet/man/wndb.5WN.html#toc4****
> Example:
> <cat:v> a lemon:LexicalEntry ;
>                lemon:sense <cat::2:29:0::>, <cat::2:35:0::> ;
> <cat::2:29:0::> a lemon:Lexical Sense ;****
>      wordnet:senseNumber "6"^^xsd:integer ;****
>
>      lemon:reference <VerbSynset76400> .****
> Regards,
> John****
> ** **
>
> ** **
> On Tue, Apr 16, 2013 at 10:02 AM, Philipp Cimiano <
> cimiano@cit-ec.uni-bielefeld.de> wrote:****
>
> Armando, all,
>
>   re point 2: on the order of senses...
>
> Yes, according to the modelling proposed right now, this would be lost.
> However, I do not think this is a major issue as we can add this
> information to the sense objects ;-) as they are unique for a particular
> word, i.e.
>
> forall w_1,w_2,s hasSense(w_1,s) & hasSense(w_2,s) -> w_1=w_2
>
> Is this something we could agree on?
>
> Philipp.
>
> Am 15.04.13 19:41, schrieb Armando Stellato:****
>
> Hi all,****
> First of all, thanks John for providing the example: through concrete
> examples it is easier to discuss!****
>  ****
> A few comments (the same “disclaimer” from Elena holds for me: hope I
> didn’t miss anything from other discussions, and in case, sorry in advance).
> ****
>  ****
>
> 1)      First of all (sorry a bit out of topic), I would ask for a
> clarification, so that I can apply the policy to my examples too: I see the
> “lemon:” prefix being used in many examples, and Lemon is an outcome of
> Monnet project. Is it also the definitive name (or a temporary name) we are
> giving to the model we are developing in this community group? I’ve been
> using “ontolex:” as a fictitious prefix in my examples, and just got
> “lemon” was being used by some of you, because those of you working on
> Monnet have started right from examples they already built in the original
> lemon. Sorry for asking what seems to be trivial, but I never got any
> definitive statement on this, so, better to realign late than never :-D
> Btw, what is written at the last row of: http://www.lemon-model.net/ seems
> to confirm my hypothesis.****
>
> ok..back to the original topic. Consider that a few of these observations
> can actually be solved by completing the example, and do not necessarily
> clash with it (or, at least, do not clash with what has been already
> written, while I don’t know of what was thought for the rest).****
>
>  ****
>
> 2)      With respect to Wordnet (which has explicitly ordered senses per
> word, where I think this order originates – at least for some of the words
> – from frequencies in SemCor) the sense ordering is lost: the synsets are
> bound to the words by means of the sole listing of values, which in plain
> RDF is unordered.****
>
>  ****
>
> 3)      This is the most important observation: the use of lemon:sense .
> Together with lemon:reference, lemon:sense should realize the bridge from
> lexical entries to conceptual entities (of the domain ontology). Should we
> use it reach the conceptual entities (e.g. synsets) of the lexical resource
> AS WELL?. In terms of black-box compatibility, as we are modelling even
> conceptual info of lexical resources (e.g.  synsets in wordnet) through
> some RDF language (e.g. SKOS), the thing is legal (the rdfs:range of
> lemon:sense, providing it is wide enough, is respected), still I’m not sure
> we want that. Shortly, I’m not sure if we want to apply exactly the same
> 3-entities approach we are using for the lexicon-ontology model, to
> modelling solely a lexical resource.
> Let’s make an example. We have myont: which is a domain ontology (where we
> have the entry myont:vomit) we are enriching with lexical content, possibly
> from wordnet. Then we have the necessity of representing a direct linking
> between some lexical entries (which may happen to be in wordnet or not) and
> the domain entities of myont.
> We would have thus this example, which I derived from both the WordNet
> example, and the generic OntoLex example for enriching an ontology with
> lexical content:
>
> <cat:v>
>                a lemon:LexicalEntry
>                lemon:sense <cat::2:29:0::>, <cat::2:35:0::> ;
> <cat::2:29:0::>
>                a lemon:LexicalSense ;
>                lemon:reference <VerbSynset76400> .
>                lemon:reference myont:vomit .
>                               ****
>
> Note that I’ve cut from the original example, the triples which are
> non-useful to the discussion.****
>
> Actually, in writing this revised example, I’m not even sure if the two
> lemon:references should be put under the same sense umbrella, or I should
> have used two different senses. This is mainly because I’m not sure about
> the concept of “sense” here and what it represents. I see potential for
> confusion even by looking at the Elena/John emails, as she rightly asks
> about the use of skos:definition instead of lemon:definition. While I’m not
> addressing here the use of a property or the other, the answer by John,
> hinting at the fact that there could be two definitions, one for a sense,
> and one for a synset (and consider that there could be a definition for the
> element in the ontology), makes me wonder how many levels we should have!
> Without delving too much in the appropriateness of this indirection for
> what concerns the lexicon-ontology interface, and considering the sole
> context of the representation of Wordnet (thus just the lexicon
> perspective), to me the path from the LexicalEntry to the Synset is too
> long. In wordnet we just say that a word is linked to a synset: period
> (modulo the addition of an ordering). In particular, “sense” is a relation
> which just tells me that synsetX is the i-th sense of word Y (and there’s a
> many-to-many rel between words and synsets).
>
> …and this brings me back to our first discussions about the choice of the
> term sense, when referring to the path from lexical entries to ontology
> elements and about the nature of “elements-in-the-middle”.
> In my view (to avoid terminological problems, I focus here on the path
> between entities, and do not name the linking properties at all, so pls
> consider all the arrows here have properties behind, in particular
> lemon:sense and lemon:reference), when considering a mapping between a
> lexical resource such as Wordnet, and an ontology, I would have seen such a
> path:
> LexicalEntry --> Synset --> OntologyResource
> where, without using WordNet, the path would have been:
> LexicalEntry --> [] --> OntologyResource
> with [] a blanknode creating this gluing between them.
> The second line is identical to what we have done until now and what has
> been written in the examples in the “Specification of
> Requirements/Lexicon-Ontology-Mapping”. In particular, the blanknode is an
> instance of that element-in-the-middle (see: “Need for an object between
> Lexical Entry and Ontology”) which still has not a name (and maybe it does
> not need to have, see point 4 below). The first line is thus my
> interpretation of how WordNet would have fit into that general template
> (different from John’s example).
> So, my idea would be to not replicate the complex lexicon-ontology linking
> inside WordNet itself, and have instead a direct linking between lexical
> entries and Synsets, and have THEN, outside of WordNet, a further link to
> an ontology element. If you look at the two rows above (and how the WordNet
> case fits the general case), this is pretty elegant, and does not introduce
> a further level of indirection which appears not necessary. Plus, with this
> method, the link from synsets to ontology elements is a necessary step to
> instantiate the path above, while in the other case, you should introduce
> it as an additional (and probably redundant) triple. You can see it in fact
> in the turtle code above, which I modelled following both the general
> example in “Specification of Requirements/Lexicon-Ontology-Mapping” and
> John’s example on WordNet: there, VerbSynset is a separate entity from
> myont:vomit. Actually, in that view, WordNet would become a separate
> “ontology” which could then be mapped to a domain ontology, instead of
> taking all the benefit of being seen as a lexical resource that can be
> used, seamlessly within our model, to enrich a domain ontology.****
>
> 4)      IMHO, we should coin a specific vocabulary for each element of
> the lexicon model, and then inherit (where appropriate) from SKOS/SKOSXL,
> to distinguish such elements which belong only to a lexical resource from
> those of any generic KOS. In the wiki, John wonders if what I called
> “SemanticIndex” is not a skos:Concept, and I reply: “yes it is, in fact my
> proposal is that our vocabulary for describing lexical resources can
> inherit from the SKOS/SKOS-XL one”. If you look at the example, even John
> did this, as the LexicalForm is nothing different from a skosxl:Label
> (where lemon:writtenRep could be replaced by skosxl:literalForm) though it
> may be worth creating a dedicated class. I would thus suggest:
> LexicalForm rdfs:subClassOf skosxl:Label
> but to use skosxl:literalForm instead of lemon:writtenRep
>
> maybe, in this specific case, we can even not reinvent a name, and totally
> reuse the skosxl:Label, which after all is not so bad and pretty fitting
> our necessities… (as it is already related to something specifically
> thought for language).
>
> On the contrary, for LLD, I would necessarily restrict the class
> skos:Concept to the class of elements which we expect to host things like
> the WordNet Synset class. You can see my sample extension-point above in
> the wiki (“Examples of Modelling in RDF (Alternative approach)”), though by
> now mean I suggest <SemanticIndex> (that was a placeholder, taken from a
> previous work), but in any case I think “Sense” is not appropriate
> (lemon:sense well evokes the sense relation, while I don’t like to see a
> class of “Senses”, that is, to me being a sense is more a role in a given
> relationship, than a intrinsic property of an object).****
>
> a.       While I think that a more-specific-than-skos:Concept class would
> be welcome for Lexical Linked Data (such as WordNet), and thus put in the
> middle of the: LexicalEntry --> ??? --> OntologyResource  template, I’m not
> sure that the lemon:sense (first arrow) should be necessarily restricted to
> it. John’s use of skos:Concept in the middle suggested me that even a
> generic well-lexicalized KOS could be used for providing LexicalEntries and
> Senses to enrich an ontology. However, I’m still thinking about it…****
>
> 5)      Another thing which comes to my mind, quite out of the WordNet
> example, but not without consequences for it... What should be, in general,
> the expected modelling behaviour when we have two terms which coincide, but
> the syntactic use of which can follow different paths?
> E.g., suppose we have a term with three senses. In the context of these
> senses, with two of them (say 1 and 2), the term has exactly identical
> variations (declensions for nouns pronouns and adjectives and conjugations
> for verbs ), and maybe other information in common (think about
> etymology!), while for the third sense, this may show differences in the
> variations (e.g. a noun would have a different plural form, or a verb has a
> different form in only one tense, when used with that sense). Should we
> model them as 3 different lexical units, or should we agglomerate the two
> identical ones into one LexicalEntry, and link it to senses 1 and 2?
> This seems to be not related to modeling WordNet in the specific, because
> variations, declinations etc.. are out of WordNet. However, this may affect
> a model trying to reuse WordNet enriched with further information… Thus
> it’s important when we consider how a WordNet modelling could be ported
> inside an extended framework with no risk of inconsistency.
>
> I just thought about a solution for this: if we allow for skosxl:Labels to
> be directly attached to Synsets (or whatever it is the superclass for
> them), and then we state the following rule:
> LexicalEntry -> lemon:canonicalForm -> skosxl:Label
> LexicalEntry -> lemon:sense -> <asynset>
> ------------------------------------
> skosxl:Label -> ???:sense (whatever it is called) -> <asynset>
>
> this would allow for the complex structure we expect in general, but also
> allow for a more neutral fit of WordNet. In fact, instead of having the
> third triple as inferred, for WordNet we could just explicitly mention the
> third one, and do not put potentially compromising information (which, in
> any case, is out of WordNet, as noted by John in his reply to Elena).
> The “???:sense (whatever it is called)” could even be lemon:sense itself,
> providing that its range is LexicalEntry+skosxl:Label.
> However, I still have to think more about that…****
>  ****
> One more thing, observation in point 2 above made me think once more that
> we should be clearer in our objectives:****
> Fact: since we have to model ontology-lexicon interfaces, and there isn’t
> much out there for representing lexical info (limited to RDF, I mean); we
> have thus to provide a model for the linguistic part, before “attaching” it
> to the ontology part. Now, the objective could be:****
>  ****
>
> 1)      We want to model lexical knowledge, and we give a model for this.
> WordNet may be (in part) more fine grained than our model…no big trouble,
> WordNet is WordNet, and our model is our model… we’ll be missing those
> details..****
>
> a.       A slightly different interpretation of the above: we want to
> model lexical knowledge, AND we decide WordNet IS the model (at least for
> the monolingual word-description needs..I leave out FrameNet et similia
> from this context of discussion). No big deal with other alternative
> resources to WordNet..****
>
> 2)      We want to model existing lexical resources. Thus WordNet, as
> well as other resources (maybe differently organized) are all important***
> *
>  ****
> Obviously, there are endless colours in the middle of the above, as we
> could be in case 1 or 2, and still think WordNet is so important that it
> has to be fully covered (also because, in this way, Princeton could decide
> to natively output each new release of WordNet in RDF too according to our
> model).****
>  ****
> Cheers,****
>  ****
> Armando****
>  ****
> P.S: I’ve brought a couple of small fixes to the page:
> http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Lexicon-Ontology-Mapping#Summary_on_Requirements_on_the_Lexicon-Ontology-Mapping_.28Synthesis_by_PC.29
>  which we already discussed 2 or 3 meetings ago.****
>  ****
>  ****
> *From:* johnmccrae@gmail.com [mailto:johnmccrae@gmail.com<johnmccrae@gmail.com>
> ] *On Behalf Of *John McCrae
> *Sent:* venerdì 12 aprile 2013 16.10
> *To:* public-ontolex
> *Subject:* WordNet modelling in Lemon and SKOS****
>  ****
> Hi all,****
>  ****
> Here is the proposed modelling of WordNet as lemon and SKOS (using
> skos:Concept for synsets)****
>  ****
>
> http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Linked_Data#Example:_WordNet_as_lemon-SKOS
> ****
>  ****
> Any comments?****
>  ****
> Regards,****
> John****
>
>
>
>
> ****
>
> -- ****
>
> Prof. Dr. Philipp Cimiano****
>
> Semantic Computing Group****
>
> Excellence Cluster - Cognitive Interaction Technology (CITEC)****
>
> University of Bielefeld****
>
> ** **
>
> Phone: +49 521 106 12249****
>
> Fax: +49 521 106 12412****
>
> Mail: cimiano@cit-ec.uni-bielefeld.de****
>
> ** **
>
> Room H-127****
>
> Morgenbreede 39****
>
> 33615 Bielefeld****
>
> ** **
>
>
>
Received on Tuesday, 16 April 2013 10:42:50 UTC