Re: lexicalization count from John P. McCrae on 2014-05-31 (public-ontolex@w3.org from May 2014)

From: John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>
Date: Sat, 31 May 2014 17:43:31 +0200
To: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Cc: public-ontolex <public-ontolex@w3.org>
Message-ID: <CAC5njqry2GarVgccUMKTLi+igFtecDsxwSqcfAYeZL0GSQj5_w@mail.gmail.com>
Hi Philipp,

OK, so we will rehash the translation debate again, I hope for not too long.

So, firstly, actually translations are not really necessary in OntoLex, as
we can infer that two entries in different languages are translations
because they have the same reference. So does this cover all our use cases?
Well nearly, there are a few issues, firstly there are some cases where
translations are paired, for example that we need to indicate that while
"TIA" and "transient ischemic attack" in English and "TIA" and
"Transitorische ischaemische Attacke" are both terms in German, we wish to
pair the acronyms and the full forms. As such we introduce an explicit
translation link between senses. Secondly, we may wish to know provenance,
confidence, etc. of a given translation procedure. In practice, this has
proven primarily necessary only for automatic sense alignment as standard
MT has no understanding of the semantics and cannot deal with the highly
ambiguous nature of ontology-lexica.

The solution for UPF should be very easy: they should just have the
translations on the ontology level and assign a sense for each ontology
reference they have. (Of course if these resources don't have an ontology
or any similar semantic structure, you should remember the name and goals
of this community group ;))

Finally, if you are still insistent in ignoring the significant prior
discussions and to ignore the founding principle R3
<http://www.w3.org/community/ontolex/wiki/Goals_and_Scope_of_Ontology-Lexica_Community_Group>
of this group I would make one last comment. 'Translation' is essentially
just a kind of 'synonymy', however synonymy between senses (with a meaning)
is fundamentally different from 'synonymy' between entries, which only
applies at some (underspecified) times. If you wish to include semantic
relationships such as synonymy and translation in the lexical layer, I
would plead that you at least give it a different name to avoid confusion
of the exact and underspecified property in real applications.

Regards,
John


On Fri, May 30, 2014 at 9:36 PM, Philipp Cimiano <
cimiano@cit-ec.uni-bielefeld.de> wrote:

>  Hi John, all,
>
>  see my comments below....
>
>
> Am 30.05.14 02:23, schrieb John P. McCrae:
>
>     Hi,
>
>  The point for discussion of these should be in the final specification,
> the model files should reflect the agreement of the CG.
>
>      Yes, you are right. I should not have done these changes right away
> before discussing them. I promise to wait for consensus before doing
> changes to the model.
>
> In any case, the good thing is that the whole list can now follow the
> updates to the model "live" and monitor them. So the current workflow based
> on GIT seems to me good and transparent enough.
>
>
>     With regards to the particular points
>
>  1) For backwards compatibility we should stick with the Monnet Lemon name
> of "definition". There's also a few other properties from lemon that we
> should consider importing to assure model compatibility.
>
>     Yes, fine. "gloss" or "definition" is fine for me. So I propose then
> to define the "definition" property in ontolex.
>
> Any opposition on this? I thinks it definitely makes sense.
>
>    The issue with BabelNet has to do with Monnet Lemon not having synsets
> (hence the domain conflict with lemon:definition) and the fix is easy (add
> LexicalConcept as a domain of definition).
>
>  2) I also dislike the name "contains", but there was significant
> discussion here and "lexicalizes" was rejected previously. In WordNet-RDF
> we used the property synset_member, so contains is possible, but I think
> maybe we need to reopen this debate?
>
>
> OK, as we all dislike "contains", then we should reopen the discussion
> again. "contains" suggest parthood / meronymy, which is too strict. I
> proposed "lexicalizes", but another proposal could be "realizes", i.e. a
> particular senses (linguistically) realizes a lexical concept. "expresses"
> might also be fine. But "lexicalizes" would be my preferred choice.
>
> Any other opinions?
>
>
>  3) No. Seriously... We have discussed this so many times...
>
>  A translation between senses has very different meaning to that between
> entries, that is we cannot say that "bank" is always "Ufer" in German, but
> we can say bank_en#2 is always ufer_de#1. We cannot "overload" two things
> with different meanings! Furthermore, it is my opinion that we should help
> people to model their resources well by not supporting poor modelling
> decisions (like ambiguous translation links).
>
>   Yes, I am afraid, I mean it seriously. Let me briefly explain why I am
> opening up this discussion again. The fact that I am re-opening the
> discussion is justified I think given the input we have received from
> (potential) users of the model. When Jorge and me visited UPF, we had a
> technical discussion with Mart and Nuria on this issue. They have a number
> of bilingual dictionaries in which the senses involved in a translation
> relation are not made explicit. Adopting lemon or ontolex would then force
> them to introduce a number of "artificial" senses, one for each translation
> pair. They found this very odd. They agree that translation is a relation
> at the sense level, but very often the actual sense(s) are underspecified
> as is the case of their lexica.
> Finally, we realized that potential adoptes struggle with the Open World
> Assumption in that they find it awkward that these "artificial senses"
> introduced to real senses to not correspond to actual senses a
> lexicographer would distinguish. So the introduction of translations at the
> sense level in this use case leads to an extreme proliferation of "senses"
> that would not correspond to "real" senses that a lexicographer would
> distinguish. Of course, we might argue that several of these "artificial
> senses" might later be identified and thus merged into "coarser" senses,
> but still we have distinct "sense" objects floating around that look odd to
> real linguists.
>
> I hope this clarifies a bit why I am reopening the issue. Apologies, but I
> think it it worth listening to the people that would potentially use the
> model in their work and take their concerns serious. We could make clear
> that "translation" is a relation between senses, but that it is possible to
> "underspecify" the senses by creating a direct link between lexical entries
> meaning something like: there are two non-specified senses of these lexical
> entries that are translations of each other.
>
> Best regards,
>
> Philipp.
>
>
>  Regards,
> John
>
>
>
>
>
> On Thu, May 29, 2014 at 10:37 PM, Philipp Cimiano <
> cimiano@cit-ec.uni-bielefeld.de> wrote:
>
>>  Dear John, all,
>>
>>  I was to propose a number of changes to the ontolex core and vartrans
>> model and I had introduced them already in the OWL files. But John was very
>> quick in noticing these changes and pointing me to the fact that they are
>> not in line with the current spec. Well, I should first have discussed
>> these proposed changes in the list, which I am doing now:
>>
>> 1) I propose to introduce a property ontolex:gloss as a subclass of
>> rdfs:comment to allow for adding definition of senses. While one could use
>> rdfs:comment for sure, people will be looking for such a property. The
>> recent work by Roberto Navigli on transforming Babelnet to lemon shows that
>> people look for such a property and, if not available, reinvent it
>> themselves.
>>
>> 2) I propose to change the property contains (dom: Lexical Concept,
>> range: Lexical Sense) into a property called "lexicalizedBy" and the
>> inverse "lexicalizes". The reason is that working with the model to
>> transform some resources (e.g. TBX, see forthcoming email on this), I
>> realized that "contains" suggest a meronymic relation that need not be
>> there in a strict sense. It is sort of there in WordNet-style resources
>> where the Synset is regarded as a set that *contains* senses. However, this
>> treatment seems to be too specific for WordNet style resources. In general,
>> what I think this relation should say is that a certain LexicalConcept is
>> lexically expressed by a number of senses (in different languages).
>> Therefore, I favour the relation "lexicalizes".
>>
>> 3) I propose to redefine the translation relation so that it can hold
>> also between Lexical Entries instead of Lexical Senses. I realized that in
>> many cases, lexical resources abstract from the particular senses that are
>> translations of each other. This is the case for many bilingual
>> dictionaries. I propose thus to overload the translation relation so that
>> the following holds:
>>
>> variantSource o trans o variantTarget -> translation
>>
>> sense o translation o sense^-1 -> translation
>>
>> where Translation \equiv exists trans.Self
>>
>> Let me know your comments,
>>
>> Philipp.
>>
>> Am 28.05.14 18:06, schrieb Armando Stellato:
>>
>>  Dear Philipp,
>>
>>
>>
>> thanks very much for your resuming email.
>>
>>
>>
>> I will reply to it more in details asap, in the meanwhile, a short note
>> about the “numberOfXXX” properties.
>>
>>
>>
>> I would go for names which are homogeneous with VoID similar properties
>> (void:entities, void:triples), and thus, have something like:
>>
>>
>>
>> lime:lexicalEntries
>>
>> lime:lexicalizations
>>
>> lime:senses
>>
>> lime:references
>>
>>
>>
>> (modulo ratios obviously :DDD ).
>>
>>
>>
>> Cheers,
>>
>>
>>
>> Armando
>>
>>
>>
>>
>>
>> *From:* Philipp Cimiano [mailto:cimiano@cit-ec.uni-bielefeld.de
>> <cimiano@cit-ec.uni-bielefeld.de>]
>> *Sent:* Wednesday, May 28, 2014 3:06 PM
>> *To:* public-ontolex@w3.org
>> *Subject:* Re: lexicalization count
>>
>>
>>
>> Armando, all,
>>
>>  yes that would be ok from my point of view.
>>
>> // counting properties (datatype properties, with domain (ontolex:Lexicon
>> OR ontolex:Lexicalization OR void:Dataset OR lime:LanguageCoverage)
>>
>> lime:numberOfLexicalEntries
>> lime:numberOfSenses
>> lime:numberOfLexicalizations (denote-tirples)
>> lime:numberOfReferences -> the number of distinct references used
>>
>> We then need to discuss whether we should also include ratios etc.
>>
>>
>> Then:
>>
>> lime:language (unified with ontolex:language, extended here to domain
>> lime:LanguageCoverage
>>
>> lime:linguisticModel: describing by which model/vocabulary information
>> about lexicalization is attached; the domain is void:Dataset and the range
>> is the URI of the vocabulary; lime:linguisticModel is thus a subproperty of
>> void:vocabulary
>>
>> Note that several linguisticModels can co-exist in principle in a
>> dataset...
>>
>> lime:type: providing a type for the resource in question, e.g. bilingual
>> lexicon, lexicon, ..., domain is void:Dataset and range is not specified
>>
>> lime:languageCoverage with domain void:Datase and range
>> lime:LanguageCoverage.
>>
>> lime:LanguageCoverage has a language, a linguistic Model and all the
>> counting properties above are defined for it.
>>
>> If this is a base model we can agree upon then I will update the wiki
>> description and the ontology.
>>
>> Let me know your comments on this.
>>
>> Regards,
>>
>> Philipp.
>>
>> Am 23.05.14 13:49, schrieb Armando Stellato:
>>
>> Hi all,
>>
>>
>>
>> Just copied and pasted from our Ontolex-Lime proposal , an open
>> discussion about the lexicalizations count (which is not about them be
>> ratios or integers :P ).
>>
>>
>> 6. Lexicalization core triples: senses or what?
>>
>>
>>
>> Senses act as reifications of the relationships between LexicalEntries
>> and Conceptual Entities (be them LexicalConcepts or entities of the
>> lexicalized ontology). In effect, a single sense is always 1-1 (it links a
>> single Lexical Entry with a single Conceptual Entity)
>>
>> The ontolex model has a shortcut for the relationship (mediated by
>> senses) between LexicalEntries and LexicalConcept: ontolex:denotes.
>>
>>
>>
>> We would propose to formally consider the number of “denotes triples”
>> (triples with predicate == ontolex:denotes) to obtain the count. Obviously,
>> this information may not always be available (not explicit nor inferred),
>> though the detail of how to obtain this are just technicalities.
>>
>>
>>
>> [added wrt the proposal] So, in shorter words, we propose to formally
>> count “lexicalizations” as the number of ontoresource <--> lexicalEntry
>> links, and not as the number of (linked) senses.
>>
>>
>>
>> To support our claim, please note the following case:
>>
>> 1.      a lexicon exists (independently of an ontology), with sense
>> descriptions for its lexical entries, and with one lexical entry having
>> two very close senses (two smooth variations of a broad meaning)
>>
>> 2.      the lexicon is used to lexicalize an ontology
>>
>> 3.      the authors of the Lexicalization decide to collapse the two
>> senses into the same ontology concept
>>
>> 4.      the two triples connecting the two similar senses to the same
>> ontology concept entail the same ontolex:denotes triple
>>
>> 5.      to the purpose of counting the lexicalizations of that lexical
>> concept, the single triple count on ontolex:denotes is more appropriate
>> than counting the two senses of a same LexicalEntry linked to the same
>> concept.
>>
>>
>>
>> Would that be ok?
>>
>>
>>
>> Cheers,
>>
>>
>>
>> Armando
>>
>>
>>
>>
>>
>>
>>  --
>>
>>
>>
>> Prof. Dr. Philipp Cimiano
>>
>>
>>
>> Phone: +49 521 106 12249
>>
>> Fax: +49 521 106 12412
>>
>> Mail: cimiano@cit-ec.uni-bielefeld.de
>>
>>
>>
>> Forschungsbau Intelligente Systeme (FBIIS)
>>
>> Raum 2.307
>>
>> Universität Bielefeld
>>
>> Inspiration 1
>>
>> 33619 Bielefeld
>>
>>
>>
>> --
>>
>> Prof. Dr. Philipp Cimiano
>>
>> Phone: +49 521 106 12249
>> Fax: +49 521 106 12412
>> Mail: cimiano@cit-ec.uni-bielefeld.de
>>
>> Forschungsbau Intelligente Systeme (FBIIS)
>> Raum 2.307
>> Universität Bielefeld
>> Inspiration 1
>> 33619 Bielefeld
>>
>>
>
>
> --
>
> Prof. Dr. Philipp Cimiano
>
> Phone: +49 521 106 12249
> Fax: +49 521 106 12412
> Mail: cimiano@cit-ec.uni-bielefeld.de
>
> Forschungsbau Intelligente Systeme (FBIIS)
> Raum 2.307
> Universität Bielefeld
> Inspiration 1
> 33619 Bielefeld
>
>
Received on Saturday, 31 May 2014 15:43:59 UTC