Re: some additional comments on the current LIME specification

On Wed, Sep 2, 2015 at 4:01 PM, Armando Stellato <stellato@info.uniroma2.it>
wrote:

> Dear all,
>
>
>
> this email follows the one in reply to John’s “A final set of issues with
> the specification”. We kept separate our clarification from some further
> comments that we put in this email.
>
>
>
> 1)      Name of avgAmbiguity: in the second-last call, John/Philipp asked
> to change avgPolisemy to avgAmbiguity, providing a more general name which
> would embrace also the case of Homonimy. We agreed (and still agree, no
> worries ;-) ). But now, we have a doubt about how to compute this: in
> Lemon, homonymous entries are handled as….different entries. So now, if we
> want to represent this very general ambiguity, which would be the glueing
> element in the model?
> E.g. suppose to have these 3 (simplified here) definitions for the word
> bank:
>
> bank-s-1: financial institution
> bank-s-2: bank as a the building of the above bank-s-1
> bank-s-3: the bank of a river
>
> now, bank-s-1 and bank-s-2 would be bound to the same lexical entry (as
> the word is polysemous in bank-s-1 and bank-s-2), let’s say bank-w-1, while
> bank-s-3, if we are not wrong, would be bound to a different lexical entry,
> bank-w-2, which is homonymous wrt the previous one.
>
Yes this is correct

>
> if we want to represent the avgAMbiguity, we need to compute the ambiguity
> of each single entry. Now, let’s think about the ambiguity of “banks”, how
> should this be computed wrt the available entries?
>
> Intended current intention of our concept of ambiguity: ambiguity(“bank”)
> = 3
>
I think the concept of ambiguity should follow that of the definition of
entry.

Per esempio... let's take the Italian word 'asse', which has two meanings
the masculine 'i assi' are wooden boards and the feminine 'le assi' are the
axes of a graph, would you consider these ambiguous? Similarly are 'essere'
and 'sei' ambiguous, e.g., 'tu sei qui alle sei'?

I argue that as long as there is a reasonable linguistic distinction
(gender, inflected forms, part-of-speech,...) that the *entries* are
distinct and not ambiguous. What is more unusual is that we have decided to
count etymology as a criteria for distinction, thus the two forms of 'bank'
are distinguishable in English and ergo not ambiguous.

>
>
> but there is no single entry “bank” in the lexicon! (which, indeed, would
> be necessary, unless we demand to indexing systems this aspect, and totally
> forget this in our model, which would be bad)
>
>
>
> so, wrt available entries, we should have:
>
>
>
> ambiguity(bank-w-1) = 2
>
> ambiguity(bank-w-2) = 1
>
>
>
> which:
>
>
>
> a.       is odd, as it really manages the two banks as two different
> things, and provides no indication about the fact that one might still find
> a large ambiguity if searching the word “bank” (‘cause if I search for
> “bank” I don’t care about much theory and differences among polysemy and
> homonymy, I just want to know that there are 3 results)
>
> b.      in the end…I’m getting the polysemy, because the homonymy is
> shadowed by the distinct entries…
>
>
>
> obviously, our intent would be to keep the overall ambiguity, so this is
> not a request to revert back to the name avgPolysemy, but a question on how
> this is reflected in the core model
>
>
>
>
>
> 2)      interpretation for avgNumOfLexicalizations: we just return on
> something which has never been discussed (but only presented by us) as
> there were more urgent matters. Now that the model is stable, we limit to
> present again the possibility to change it as it wouldn’t scramble all the
> model. Actually, we are not even pushing for one interpretation or the
> other, but thought was worth mentioning.
>
> according to the formula in:
> https://www.w3.org/community/ontolex/wiki/images/9/90/Formula_avgNumOfLexicalizations-v1.png,
> the denominator comprises all the elements in the ontology. Since we
> already have statistics about overall covered elements (lime:percentage) we
> could consider to apply a different version of avgNumOfLexicalizations,
> which is considering only the elements effectively participating to at
> least a lexicalization. In this case, this value would be more independent
> (and thus more descriptive) from the other.
>
> E:g. I have a ontology O lexicalized by lexicon L, for only 10% of its
> concepts. However, for those 10% concepts, the average number of
> lexicalizations is 4. This means that the lexicon badly covers the
> ontology, but to its extent, it really describes well the covered
> references. If we considered the avg over all references (including the non
> lexicalized ones), I would get a 0,4, which is not providing much. Getting
> 10% of percentage and 4 of avgNumOfLexicalization much better represents
> the lexicalization.
>
Surely avgNumOfLexicalizations is more descriptive if it is allowed to
include entries that have zero lexicalizations. I don't see the advantage
to this change, either value can be obtained by multiplying or dividing the
other by 'percentage'.

>
>
> 3)      Name of percentage: this has been raised by John in his resuming
> email as well. Actually, we called it initially coverage, but it had a
> different structure. With the reification of the property using partitions
> it was later changed to percentage. Since we changed again the structure,
> maybe that coverage makes much more sense. Percentage was good in the
> context of a reified object expressing the coverage, and the percentage was
> limited to the mere number. Now it could make sense to get back to the
> original name.
>
 'coverage' or even 'ontologyCoverage' would be preferable to 'percentage'

>
>
> 4)      The formula in:
> https://www.w3.org/community/ontolex/wiki/File:Percentage_formula.gif is
> a bit confusing. The predicate lexicalizes(entity,entry) is never used
> formally in the specification. In any case, probably
> lexicalizes(entry,entity) would make more sense as usually, when giving
> verbs as names of predicates, the action should go from the first argument
> to the second one. Also, reference should be referenceDataset or ontology,
> as used in other cases. If ok for you, we can change it.
>
>
>
I would be in favour of introducing a clear, consistent formula for every
ratio in this section. From Manuel's answer to my final list of points it
seems that I could not figure out how to calculate all the values.

> 5)      Definition of avgNumOfLinks: this property indicates the average
> number of links to a concept for each ontology element in the reference
> dataset.
>
>
>
Erm... what is the issue?

> 6)      we don't link "to a concept", as it seems that in play we have a
> single concept linked many times by the same reference. Could we restate
> as: “this property indicates the average number of links to lexical
> concepts for each ontology element in the reference dataset” ?
>
I don't see why not

Regards,
John

>
>
>
>
> Cheers,
>
>
>
> Armando and Manuel
>
>
>
>
>
>
>
>
>

Received on Friday, 4 September 2015 09:22:52 UTC