Re: lexicalization count from Philipp Cimiano on 2014-05-30 (public-ontolex@w3.org from May 2014)

From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Date: Fri, 30 May 2014 21:36:31 +0200
To: "John P. McCrae" <jmccrae@cit-ec.uni-bielefeld.de>
CC: public-ontolex <public-ontolex@w3.org>
Message-ID: <5388DDBF.3090104@cit-ec.uni-bielefeld.de>
Hi John, all,

  see my comments below....


Am 30.05.14 02:23, schrieb John P. McCrae:
> Hi,
>
> The point for discussion of these should be in the final 
> specification, the model files should reflect the agreement of the CG.
>
Yes, you are right. I should not have done these changes right away 
before discussing them. I promise to wait for consensus before doing 
changes to the model.

In any case, the good thing is that the whole list can now follow the 
updates to the model "live" and monitor them. So the current workflow 
based on GIT seems to me good and transparent enough.

> With regards to the particular points
>
> 1) For backwards compatibility we should stick with the Monnet Lemon 
> name of "definition". There's also a few other properties from lemon 
> that we should consider importing to assure model compatibility.
>
Yes, fine. "gloss" or "definition" is fine for me. So I propose then to 
define the "definition" property in ontolex.

Any opposition on this? I thinks it definitely makes sense.

> The issue with BabelNet has to do with Monnet Lemon not having synsets 
> (hence the domain conflict with lemon:definition) and the fix is easy 
> (add LexicalConcept as a domain of definition).
>
> 2) I also dislike the name "contains", but there was significant 
> discussion here and "lexicalizes" was rejected previously. In 
> WordNet-RDF we used the property synset_member, so contains is 
> possible, but I think maybe we need to reopen this debate?

OK, as we all dislike "contains", then we should reopen the discussion 
again. "contains" suggest parthood / meronymy, which is too strict. I 
proposed "lexicalizes", but another proposal could be "realizes", i.e. a 
particular senses (linguistically) realizes a lexical concept. 
"expresses" might also be fine. But "lexicalizes" would be my preferred 
choice.

Any other opinions?
>
> 3) No. Seriously... We have discussed this so many times...
>
> A translation between senses has very different meaning to that 
> between entries, that is we cannot say that "bank" is always "Ufer" in 
> German, but we can say bank_en#2 is always ufer_de#1. We cannot 
> "overload" two things with different meanings! Furthermore, it is my 
> opinion that we should help people to model their resources well by 
> not supporting poor modelling decisions (like ambiguous translation 
> links).
>
Yes, I am afraid, I mean it seriously. Let me briefly explain why I am 
opening up this discussion again. The fact that I am re-opening the 
discussion is justified I think given the input we have received from 
(potential) users of the model. When Jorge and me visited UPF, we had a 
technical discussion with Mart and Nuria on this issue. They have a 
number of bilingual dictionaries in which the senses involved in a 
translation relation are not made explicit. Adopting lemon or ontolex 
would then force them to introduce a number of "artificial" senses, one 
for each translation pair. They found this very odd. They agree that 
translation is a relation at the sense level, but very often the actual 
sense(s) are underspecified as is the case of their lexica.
Finally, we realized that potential adoptes struggle with the Open World 
Assumption in that they find it awkward that these "artificial senses" 
introduced to real senses to not correspond to actual senses a 
lexicographer would distinguish. So the introduction of translations at 
the sense level in this use case leads to an extreme proliferation of 
"senses" that would not correspond to "real" senses that a lexicographer 
would distinguish. Of course, we might argue that several of these 
"artificial senses" might later be identified and thus merged into 
"coarser" senses, but still we have distinct "sense" objects floating 
around that look odd to real linguists.

I hope this clarifies a bit why I am reopening the issue. Apologies, but 
I think it it worth listening to the people that would potentially use 
the model in their work and take their concerns serious. We could make 
clear that "translation" is a relation between senses, but that it is 
possible to "underspecify" the senses by creating a direct link between 
lexical entries meaning something like: there are two non-specified 
senses of these lexical entries that are translations of each other.

Best regards,

Philipp.

> Regards,
> John
>
>
>
>
>
> On Thu, May 29, 2014 at 10:37 PM, Philipp Cimiano 
> <cimiano@cit-ec.uni-bielefeld.de 
> <mailto:cimiano@cit-ec.uni-bielefeld.de>> wrote:
>
>     Dear John, all,
>
>      I was to propose a number of changes to the ontolex core and
>     vartrans model and I had introduced them already in the OWL files.
>     But John was very quick in noticing these changes and pointing me
>     to the fact that they are not in line with the current spec. Well,
>     I should first have discussed these proposed changes in the list,
>     which I am doing now:
>
>     1) I propose to introduce a property ontolex:gloss as a subclass
>     of rdfs:comment to allow for adding definition of senses. While
>     one could use rdfs:comment for sure, people will be looking for
>     such a property. The recent work by Roberto Navigli on
>     transforming Babelnet to lemon shows that people look for such a
>     property and, if not available, reinvent it themselves.
>
>     2) I propose to change the property contains (dom: Lexical
>     Concept, range: Lexical Sense) into a property called
>     "lexicalizedBy" and the inverse "lexicalizes". The reason is that
>     working with the model to transform some resources (e.g. TBX, see
>     forthcoming email on this), I realized that "contains" suggest a
>     meronymic relation that need not be there in a strict sense. It is
>     sort of there in WordNet-style resources where the Synset is
>     regarded as a set that *contains* senses. However, this treatment
>     seems to be too specific for WordNet style resources. In general,
>     what I think this relation should say is that a certain
>     LexicalConcept is lexically expressed by a number of senses (in
>     different languages). Therefore, I favour the relation "lexicalizes".
>
>     3) I propose to redefine the translation relation so that it can
>     hold also between Lexical Entries instead of Lexical Senses. I
>     realized that in many cases, lexical resources abstract from the
>     particular senses that are translations of each other. This is the
>     case for many bilingual dictionaries. I propose thus to overload
>     the translation relation so that the following holds:
>
>     variantSource o trans o variantTarget -> translation
>
>     sense o translation o sense^-1 -> translation
>
>     where Translation \equiv exists trans.Self
>
>     Let me know your comments,
>
>     Philipp.
>
>     Am 28.05.14 18:06, schrieb Armando Stellato:
>>
>>     Dear Philipp,
>>
>>     thanks very much for your resuming email.
>>
>>     I will reply to it more in details asap, in the meanwhile, a
>>     short note about the “numberOfXXX” properties.
>>
>>     I would go for names which are homogeneous with VoID similar
>>     properties (void:entities, void:triples), and thus, have
>>     something like:
>>
>>     lime:lexicalEntries
>>
>>     lime:lexicalizations
>>
>>     lime:senses
>>
>>     lime:references
>>
>>     (modulo ratios obviously :DDD ).
>>
>>     Cheers,
>>
>>     Armando
>>
>>     *From:*Philipp Cimiano [mailto:cimiano@cit-ec.uni-bielefeld.de]
>>     *Sent:* Wednesday, May 28, 2014 3:06 PM
>>     *To:* public-ontolex@w3.org <mailto:public-ontolex@w3.org>
>>     *Subject:* Re: lexicalization count
>>
>>     Armando, all,
>>
>>      yes that would be ok from my point of view.
>>
>>     // counting properties (datatype properties, with domain
>>     (ontolex:Lexicon OR ontolex:Lexicalization OR void:Dataset OR
>>     lime:LanguageCoverage)
>>
>>     lime:numberOfLexicalEntries
>>     lime:numberOfSenses
>>     lime:numberOfLexicalizations (denote-tirples)
>>     lime:numberOfReferences -> the number of distinct references used
>>
>>     We then need to discuss whether we should also include ratios etc.
>>
>>
>>     Then:
>>
>>     lime:language (unified with ontolex:language, extended here to
>>     domain lime:LanguageCoverage
>>
>>     lime:linguisticModel: describing by which model/vocabulary
>>     information about lexicalization is attached; the domain is
>>     void:Dataset and the range is the URI of the vocabulary;
>>     lime:linguisticModel is thus a subproperty of void:vocabulary
>>
>>     Note that several linguisticModels can co-exist in principle in a
>>     dataset...
>>
>>     lime:type: providing a type for the resource in question, e.g.
>>     bilingual lexicon, lexicon, ..., domain is void:Dataset and range
>>     is not specified
>>
>>     lime:languageCoverage with domain void:Datase and range
>>     lime:LanguageCoverage.
>>
>>     lime:LanguageCoverage has a language, a linguistic Model and all
>>     the counting properties above are defined for it.
>>
>>     If this is a base model we can agree upon then I will update the
>>     wiki description and the ontology.
>>
>>     Let me know your comments on this.
>>
>>     Regards,
>>
>>     Philipp.
>>
>>     Am 23.05.14 13:49, schrieb Armando Stellato:
>>
>>         Hi all,
>>
>>         Just copied and pasted from our Ontolex-Lime proposal , an
>>         open discussion about the lexicalizations count (which is not
>>         about them be ratios or integers :P ).
>>
>>
>>             6. Lexicalization core triples: senses or what?
>>
>>         Senses act as reifications of the relationships between
>>         LexicalEntries and Conceptual Entities (be them
>>         LexicalConcepts or entities of the lexicalized ontology). In
>>         effect, a single sense is always 1-1 (it links a single
>>         Lexical Entry with a single Conceptual Entity)
>>
>>         The ontolex model has a shortcut for the relationship
>>         (mediated by senses) between LexicalEntries and
>>         LexicalConcept: ontolex:denotes.
>>
>>         We would propose to formally consider the number of “denotes
>>         triples” (triples with predicate == ontolex:denotes) to
>>         obtain the count. Obviously, this information may not always
>>         be available (not explicit nor inferred), though the detail
>>         of how to obtain this are just technicalities.
>>
>>         [added wrt the proposal] So, in shorter words, we propose to
>>         formally count “lexicalizations” as the number of
>>         ontoresource <--> lexicalEntry links, and not as the number
>>         of (linked) senses.
>>
>>         To support our claim, please note the following case:
>>
>>         1.a lexicon exists (independently of an ontology), with sense
>>         descriptions for its lexical entries,andwith one lexical
>>         entry having two very close senses (two smooth variations of
>>         a broad meaning)
>>
>>         2.the lexicon is used to lexicalize an ontology
>>
>>         3.the authors of the Lexicalization decide to collapse the
>>         two senses into the same ontology concept
>>
>>         4.the two triples connecting the two similar senses to the
>>         same ontology concept entail the same ontolex:denotes triple
>>
>>         5.to the purpose of counting the lexicalizations of that
>>         lexical concept, the single triple count on ontolex:denotes
>>         is more appropriate than counting the two senses of a same
>>         LexicalEntry linked to the same concept.
>>
>>         Would that be ok?
>>
>>         Cheers,
>>
>>         Armando
>>
>>
>>
>>
>>     -- 
>>       
>>     Prof. Dr. Philipp Cimiano
>>       
>>     Phone:+49 521 106 12249  <tel:%2B49%20521%20106%2012249>
>>     Fax:+49 521 106 12412  <tel:%2B49%20521%20106%2012412>
>>     Mail:cimiano@cit-ec.uni-bielefeld.de  <mailto:cimiano@cit-ec.uni-bielefeld.de>
>>       
>>     Forschungsbau Intelligente Systeme (FBIIS)
>>     Raum 2.307
>>     Universität Bielefeld
>>     Inspiration 1
>>     33619 Bielefeld
>
>
>     -- 
>
>     Prof. Dr. Philipp Cimiano
>
>     Phone:+49 521 106 12249  <tel:%2B49%20521%20106%2012249>
>     Fax:+49 521 106 12412  <tel:%2B49%20521%20106%2012412>
>     Mail:cimiano@cit-ec.uni-bielefeld.de  <mailto:cimiano@cit-ec.uni-bielefeld.de>
>
>     Forschungsbau Intelligente Systeme (FBIIS)
>     Raum 2.307
>     Universität Bielefeld
>     Inspiration 1
>     33619 Bielefeld
>
>


-- 

Prof. Dr. Philipp Cimiano

Phone: +49 521 106 12249
Fax: +49 521 106 12412
Mail: cimiano@cit-ec.uni-bielefeld.de

Forschungsbau Intelligente Systeme (FBIIS)
Raum 2.307
Universität Bielefeld
Inspiration 1
33619 Bielefeld
Received on Friday, 30 May 2014 19:37:02 UTC