Re: lexicalization count from Philipp Cimiano on 2014-06-02 (public-ontolex@w3.org from June 2014)

From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Date: Mon, 02 Jun 2014 21:54:14 +0200
To: "John P. McCrae" <jmccrae@cit-ec.uni-bielefeld.de>
CC: public-ontolex <public-ontolex@w3.org>
Message-ID: <538CD666.7030305@cit-ec.uni-bielefeld.de>
Hi John,

     oh yes, principle R3. I vaguely remember it. Thanks for reminding 
me of this principle that I have been vigorously defending may times and 
that now I tend to forget occasionally ;-)

Now seriously. There are a number of people that are using lemon and 
hopefully ontolex at some stage for use cases that the model was not 
originally meant to support.

The use of lemon and similar vocabularies to describe bilingual 
dictionaries was surely not the originally purpose of the model. 
Nevertheless, there are people interested in using the model for this. 
So being very pragmatic here and in order to foster wide reuse of the 
model, we should have an interest in supporting this.

In the case of the UPF bilingual resources, there are no references 
because there is no ontology.
So the vanilla solution for classical ontolex models, i.e. simply using 
the lexical entry - sense - reference, does not work here. We could 
argue that this is not what the model is to be used for ... hmm ....

Your are right in that translation is a specific example of synonymy 
across languages. Agreed.

Synonymy between senses and synonymy between entries is clearly 
different, I agree.
However, I am saying that we could facilitate the use of the model by 
allowing people to use the translation relation by underspecifying the 
actual senses that are translations of each other.

It is about defining the translation relation as follows:

\forall x, y translation(x,y) <-> (Sense(x) \wedge Sense(y) \wedge 
interlingualSenseVariants(x,y)) \vee ( LexicalEntry(x) \wedge 
LexicalEntry(y) \wedge \exists s_x, s_y sense(x,s_x) \wedge sense(y,s_y) 
InterlingualSenseVariants(x,y))

There is clearly no logical issue in defining such a property.

It boils down to a primitive property interlingualSenseVariant defined 
between senses.

So what I then propose is to introduce a basic property 
InterlingualSenseVariant for what we have so far been calling 
"translation" in the vartrans module, and what you call "translation" in 
your answer below; this would be  a subclass of SenseVariant and 
InterlingualVariant.

Translation could then be a polymorophic property that is defined 
between senses or lexical entries with respect to 
interlingualSenseVariant as base property as sketched above. I do not 
see any harm in this if it fosters use of the model.

To be maximally precise, we could define the property 
InterlingualSenseVariant that precisely corresponds to what you call a 
translation; and then have a somehow underspecified property or class 
translation or Translation.

So technically  translation would become a superproperty of the relation 
"interlingualSenseVariant", which I am not proposing to introduce 
explicitly, but only conceptually as the counterpart of the class 
"InterlingualSenseVariant".

With this, we give maximal precision of expression to those people who 
want it and a more "vague" concept of Translation for those who do not 
need this kind of precision.

Regards,

Philipp.


Am 31.05.14 17:43, schrieb John P. McCrae:
> Hi Philipp,
>
> OK, so we will rehash the translation debate again, I hope for not too 
> long.
>
> So, firstly, actually translations are not really necessary in 
> OntoLex, as we can infer that two entries in different languages are 
> translations because they have the same reference. So does this cover 
> all our use cases? Well nearly, there are a few issues, firstly there 
> are some cases where translations are paired, for example that we need 
> to indicate that while "TIA" and "transient ischemic attack" in 
> English and "TIA" and "Transitorische ischaemische Attacke" are both 
> terms in German, we wish to pair the acronyms and the full forms. As 
> such we introduce an explicit translation link between senses. 
> Secondly, we may wish to know provenance, confidence, etc. of a given 
> translation procedure. In practice, this has proven primarily 
> necessary only for automatic sense alignment as standard MT has no 
> understanding of the semantics and cannot deal with the highly 
> ambiguous nature of ontology-lexica.
>
> The solution for UPF should be very easy: they should just have the 
> translations on the ontology level and assign a sense for each 
> ontology reference they have. (Of course if these resources don't have 
> an ontology or any similar semantic structure, you should remember the 
> name and goals of this community group ;))
>
> Finally, if you are still insistent in ignoring the significant prior 
> discussions and to ignore the founding principle R3 
> <http://www.w3.org/community/ontolex/wiki/Goals_and_Scope_of_Ontology-Lexica_Community_Group> 
> of this group I would make one last comment. 'Translation' is 
> essentially just a kind of 'synonymy', however synonymy between senses 
> (with a meaning) is fundamentally different from 'synonymy' between 
> entries, which only applies at some (underspecified) times. If you 
> wish to include semantic relationships such as synonymy and 
> translation in the lexical layer, I would plead that you at least give 
> it a different name to avoid confusion of the exact and underspecified 
> property in real applications.
>
> Regards,
> John
>
>
> On Fri, May 30, 2014 at 9:36 PM, Philipp Cimiano 
> <cimiano@cit-ec.uni-bielefeld.de 
> <mailto:cimiano@cit-ec.uni-bielefeld.de>> wrote:
>
>     Hi John, all,
>
>      see my comments below....
>
>
>     Am 30.05.14 02:23, schrieb John P. McCrae:
>>     Hi,
>>
>>     The point for discussion of these should be in the final
>>     specification, the model files should reflect the agreement of
>>     the CG.
>>
>     Yes, you are right. I should not have done these changes right
>     away before discussing them. I promise to wait for consensus
>     before doing changes to the model.
>
>     In any case, the good thing is that the whole list can now follow
>     the updates to the model "live" and monitor them. So the current
>     workflow based on GIT seems to me good and transparent enough.
>
>
>>     With regards to the particular points
>>
>>     1) For backwards compatibility we should stick with the Monnet
>>     Lemon name of "definition". There's also a few other properties
>>     from lemon that we should consider importing to assure model
>>     compatibility.
>>
>     Yes, fine. "gloss" or "definition" is fine for me. So I propose
>     then to define the "definition" property in ontolex.
>
>     Any opposition on this? I thinks it definitely makes sense.
>
>>     The issue with BabelNet has to do with Monnet Lemon not having
>>     synsets (hence the domain conflict with lemon:definition) and the
>>     fix is easy (add LexicalConcept as a domain of definition).
>>
>>     2) I also dislike the name "contains", but there was significant
>>     discussion here and "lexicalizes" was rejected previously. In
>>     WordNet-RDF we used the property synset_member, so contains is
>>     possible, but I think maybe we need to reopen this debate?
>
>     OK, as we all dislike "contains", then we should reopen the
>     discussion again. "contains" suggest parthood / meronymy, which is
>     too strict. I proposed "lexicalizes", but another proposal could
>     be "realizes", i.e. a particular senses (linguistically) realizes
>     a lexical concept. "expresses" might also be fine. But
>     "lexicalizes" would be my preferred choice.
>
>     Any other opinions?
>
>>
>>     3) No. Seriously... We have discussed this so many times...
>>
>>     A translation between senses has very different meaning to that
>>     between entries, that is we cannot say that "bank" is always
>>     "Ufer" in German, but we can say bank_en#2 is always ufer_de#1.
>>     We cannot "overload" two things with different meanings!
>>     Furthermore, it is my opinion that we should help people to model
>>     their resources well by not supporting poor modelling decisions
>>     (like ambiguous translation links).
>>
>     Yes, I am afraid, I mean it seriously. Let me briefly explain why
>     I am opening up this discussion again. The fact that I am
>     re-opening the discussion is justified I think given the input we
>     have received from (potential) users of the model. When Jorge and
>     me visited UPF, we had a technical discussion with Mart and Nuria
>     on this issue. They have a number of bilingual dictionaries in
>     which the senses involved in a translation relation are not made
>     explicit. Adopting lemon or ontolex would then force them to
>     introduce a number of "artificial" senses, one for each
>     translation pair. They found this very odd. They agree that
>     translation is a relation at the sense level, but very often the
>     actual sense(s) are underspecified as is the case of their lexica.
>     Finally, we realized that potential adoptes struggle with the Open
>     World Assumption in that they find it awkward that these
>     "artificial senses" introduced to real senses to not correspond to
>     actual senses a lexicographer would distinguish. So the
>     introduction of translations at the sense level in this use case
>     leads to an extreme proliferation of "senses" that would not
>     correspond to "real" senses that a lexicographer would
>     distinguish. Of course, we might argue that several of these
>     "artificial senses" might later be identified and thus merged into
>     "coarser" senses, but still we have distinct "sense" objects
>     floating around that look odd to real linguists.
>
>     I hope this clarifies a bit why I am reopening the issue.
>     Apologies, but I think it it worth listening to the people that
>     would potentially use the model in their work and take their
>     concerns serious. We could make clear that "translation" is a
>     relation between senses, but that it is possible to "underspecify"
>     the senses by creating a direct link between lexical entries
>     meaning something like: there are two non-specified senses of
>     these lexical entries that are translations of each other.
>
>     Best regards,
>
>     Philipp.
>
>
>>     Regards,
>>     John
>>
>>
>>
>>
>>
>>     On Thu, May 29, 2014 at 10:37 PM, Philipp Cimiano
>>     <cimiano@cit-ec.uni-bielefeld.de
>>     <mailto:cimiano@cit-ec.uni-bielefeld.de>> wrote:
>>
>>         Dear John, all,
>>
>>          I was to propose a number of changes to the ontolex core and
>>         vartrans model and I had introduced them already in the OWL
>>         files. But John was very quick in noticing these changes and
>>         pointing me to the fact that they are not in line with the
>>         current spec. Well, I should first have discussed these
>>         proposed changes in the list, which I am doing now:
>>
>>         1) I propose to introduce a property ontolex:gloss as a
>>         subclass of rdfs:comment to allow for adding definition of
>>         senses. While one could use rdfs:comment for sure, people
>>         will be looking for such a property. The recent work by
>>         Roberto Navigli on transforming Babelnet to lemon shows that
>>         people look for such a property and, if not available,
>>         reinvent it themselves.
>>
>>         2) I propose to change the property contains (dom: Lexical
>>         Concept, range: Lexical Sense) into a property called
>>         "lexicalizedBy" and the inverse "lexicalizes". The reason is
>>         that working with the model to transform some resources (e.g.
>>         TBX, see forthcoming email on this), I realized that
>>         "contains" suggest a meronymic relation that need not be
>>         there in a strict sense. It is sort of there in WordNet-style
>>         resources where the Synset is regarded as a set that
>>         *contains* senses. However, this treatment seems to be too
>>         specific for WordNet style resources. In general, what I
>>         think this relation should say is that a certain
>>         LexicalConcept is lexically expressed by a number of senses
>>         (in different languages). Therefore, I favour the relation
>>         "lexicalizes".
>>
>>         3) I propose to redefine the translation relation so that it
>>         can hold also between Lexical Entries instead of Lexical
>>         Senses. I realized that in many cases, lexical resources
>>         abstract from the particular senses that are translations of
>>         each other. This is the case for many bilingual dictionaries.
>>         I propose thus to overload the translation relation so that
>>         the following holds:
>>
>>         variantSource o trans o variantTarget -> translation
>>
>>         sense o translation o sense^-1 -> translation
>>
>>         where Translation \equiv exists trans.Self
>>
>>         Let me know your comments,
>>
>>         Philipp.
>>
>>         Am 28.05.14 18:06, schrieb Armando Stellato:
>>>
>>>         Dear Philipp,
>>>
>>>         thanks very much for your resuming email.
>>>
>>>         I will reply to it more in details asap, in the meanwhile, a
>>>         short note about the “numberOfXXX” properties.
>>>
>>>         I would go for names which are homogeneous with VoID similar
>>>         properties (void:entities, void:triples), and thus, have
>>>         something like:
>>>
>>>         lime:lexicalEntries
>>>
>>>         lime:lexicalizations
>>>
>>>         lime:senses
>>>
>>>         lime:references
>>>
>>>         (modulo ratios obviously :DDD ).
>>>
>>>         Cheers,
>>>
>>>         Armando
>>>
>>>         *From:*Philipp Cimiano [mailto:cimiano@cit-ec.uni-bielefeld.de]
>>>         *Sent:* Wednesday, May 28, 2014 3:06 PM
>>>         *To:* public-ontolex@w3.org <mailto:public-ontolex@w3.org>
>>>         *Subject:* Re: lexicalization count
>>>
>>>         Armando, all,
>>>
>>>          yes that would be ok from my point of view.
>>>
>>>         // counting properties (datatype properties, with domain
>>>         (ontolex:Lexicon OR ontolex:Lexicalization OR void:Dataset
>>>         OR lime:LanguageCoverage)
>>>
>>>         lime:numberOfLexicalEntries
>>>         lime:numberOfSenses
>>>         lime:numberOfLexicalizations (denote-tirples)
>>>         lime:numberOfReferences -> the number of distinct references
>>>         used
>>>
>>>         We then need to discuss whether we should also include
>>>         ratios etc.
>>>
>>>
>>>         Then:
>>>
>>>         lime:language (unified with ontolex:language, extended here
>>>         to domain lime:LanguageCoverage
>>>
>>>         lime:linguisticModel: describing by which model/vocabulary
>>>         information about lexicalization is attached; the domain is
>>>         void:Dataset and the range is the URI of the vocabulary;
>>>         lime:linguisticModel is thus a subproperty of void:vocabulary
>>>
>>>         Note that several linguisticModels can co-exist in principle
>>>         in a dataset...
>>>
>>>         lime:type: providing a type for the resource in question,
>>>         e.g. bilingual lexicon, lexicon, ..., domain is void:Dataset
>>>         and range is not specified
>>>
>>>         lime:languageCoverage with domain void:Datase and range
>>>         lime:LanguageCoverage.
>>>
>>>         lime:LanguageCoverage has a language, a linguistic Model and
>>>         all the counting properties above are defined for it.
>>>
>>>         If this is a base model we can agree upon then I will update
>>>         the wiki description and the ontology.
>>>
>>>         Let me know your comments on this.
>>>
>>>         Regards,
>>>
>>>         Philipp.
>>>
>>>         Am 23.05.14 13:49, schrieb Armando Stellato:
>>>
>>>             Hi all,
>>>
>>>             Just copied and pasted from our Ontolex-Lime proposal ,
>>>             an open discussion about the lexicalizations count
>>>             (which is not about them be ratios or integers :P ).
>>>
>>>
>>>                 6. Lexicalization core triples: senses or what?
>>>
>>>             Senses act as reifications of the relationships between
>>>             LexicalEntries and Conceptual Entities (be them
>>>             LexicalConcepts or entities of the lexicalized
>>>             ontology). In effect, a single sense is always 1-1 (it
>>>             links a single Lexical Entry with a single Conceptual
>>>             Entity)
>>>
>>>             The ontolex model has a shortcut for the relationship
>>>             (mediated by senses) between LexicalEntries and
>>>             LexicalConcept: ontolex:denotes.
>>>
>>>             We would propose to formally consider the number of
>>>             “denotes triples” (triples with predicate ==
>>>             ontolex:denotes) to obtain the count. Obviously, this
>>>             information may not always be available (not explicit
>>>             nor inferred), though the detail of how to obtain this
>>>             are just technicalities.
>>>
>>>             [added wrt the proposal] So, in shorter words, we
>>>             propose to formally count “lexicalizations” as the
>>>             number of ontoresource <--> lexicalEntry links, and not
>>>             as the number of (linked) senses.
>>>
>>>             To support our claim, please note the following case:
>>>
>>>             1.a lexicon exists (independently of an ontology), with
>>>             sense descriptions for its lexical entries,andwith one
>>>             lexical entry having two very close senses (two smooth
>>>             variations of a broad meaning)
>>>
>>>             2.the lexicon is used to lexicalize an ontology
>>>
>>>             3.the authors of the Lexicalization decide to collapse
>>>             the two senses into the same ontology concept
>>>
>>>             4.the two triples connecting the two similar senses to
>>>             the same ontology concept entail the same
>>>             ontolex:denotes triple
>>>
>>>             5.to the purpose of counting the lexicalizations of that
>>>             lexical concept, the single triple count on
>>>             ontolex:denotes is more appropriate than counting the
>>>             two senses of a same LexicalEntry linked to the same
>>>             concept.
>>>
>>>             Would that be ok?
>>>
>>>             Cheers,
>>>
>>>             Armando
>>>
>>>
>>>
>>>
>>>         -- 
>>>           
>>>         Prof. Dr. Philipp Cimiano
>>>           
>>>         Phone:+49 521 106 12249  <tel:%2B49%20521%20106%2012249>
>>>         Fax:+49 521 106 12412  <tel:%2B49%20521%20106%2012412>
>>>         Mail:cimiano@cit-ec.uni-bielefeld.de  <mailto:cimiano@cit-ec.uni-bielefeld.de>
>>>           
>>>         Forschungsbau Intelligente Systeme (FBIIS)
>>>         Raum 2.307
>>>         Universität Bielefeld
>>>         Inspiration 1
>>>         33619 Bielefeld
>>
>>
>>         -- 
>>
>>         Prof. Dr. Philipp Cimiano
>>
>>         Phone:+49 521 106 12249  <tel:%2B49%20521%20106%2012249>
>>         Fax:+49 521 106 12412  <tel:%2B49%20521%20106%2012412>
>>         Mail:cimiano@cit-ec.uni-bielefeld.de  <mailto:cimiano@cit-ec.uni-bielefeld.de>
>>
>>         Forschungsbau Intelligente Systeme (FBIIS)
>>         Raum 2.307
>>         Universität Bielefeld
>>         Inspiration 1
>>         33619 Bielefeld
>>
>>
>
>
>     -- 
>
>     Prof. Dr. Philipp Cimiano
>
>     Phone:+49 521 106 12249  <tel:%2B49%20521%20106%2012249>
>     Fax:+49 521 106 12412  <tel:%2B49%20521%20106%2012412>
>     Mail:cimiano@cit-ec.uni-bielefeld.de  <mailto:cimiano@cit-ec.uni-bielefeld.de>
>
>     Forschungsbau Intelligente Systeme (FBIIS)
>     Raum 2.307
>     Universität Bielefeld
>     Inspiration 1
>     33619 Bielefeld
>
>


-- 

Prof. Dr. Philipp Cimiano

Phone: +49 521 106 12249
Fax: +49 521 106 12412
Mail: cimiano@cit-ec.uni-bielefeld.de

Forschungsbau Intelligente Systeme (FBIIS)
Raum 2.307
Universität Bielefeld
Inspiration 1
33619 Bielefeld
Received on Monday, 2 June 2014 19:54:47 UTC