some additional comments on the current LIME specification

Dear all,

 

this email follows the one in reply to John's "A final set of issues with
the specification". We kept separate our clarification from some further
comments that we put in this email. 

 

1)      Name of avgAmbiguity: in the second-last call, John/Philipp asked to
change avgPolisemy to avgAmbiguity, providing a more general name which
would embrace also the case of Homonimy. We agreed (and still agree, no
worries ;-) ). But now, we have a doubt about how to compute this: in Lemon,
homonymous entries are handled as..different entries. So now, if we want to
represent this very general ambiguity, which would be the glueing element in
the model?
E.g. suppose to have these 3 (simplified here) definitions for the word
bank:

bank-s-1: financial institution
bank-s-2: bank as a the building of the above bank-s-1
bank-s-3: the bank of a river

now, bank-s-1 and bank-s-2 would be bound to the same lexical entry (as the
word is polysemous in bank-s-1 and bank-s-2), let's say bank-w-1, while
bank-s-3, if we are not wrong, would be bound to a different lexical entry,
bank-w-2, which is homonymous wrt the previous one.

if we want to represent the avgAMbiguity, we need to compute the ambiguity
of each single entry. Now, let's think about the ambiguity of "banks", how
should this be computed wrt the available entries?

Intended current intention of our concept of ambiguity: ambiguity("bank") =
3

 

but there is no single entry "bank" in the lexicon! (which, indeed, would be
necessary, unless we demand to indexing systems this aspect, and totally
forget this in our model, which would be bad)

 

so, wrt available entries, we should have:

 

ambiguity(bank-w-1) = 2

ambiguity(bank-w-2) = 1

 

which:

 

a.       is odd, as it really manages the two banks as two different things,
and provides no indication about the fact that one might still find a large
ambiguity if searching the word "bank" ('cause if I search for "bank" I
don't care about much theory and differences among polysemy and homonymy, I
just want to know that there are 3 results)

b.      in the end.I'm getting the polysemy, because the homonymy is
shadowed by the distinct entries.

 

obviously, our intent would be to keep the overall ambiguity, so this is not
a request to revert back to the name avgPolysemy, but a question on how this
is reflected in the core model

 

 

2)      interpretation for avgNumOfLexicalizations: we just return on
something which has never been discussed (but only presented by us) as there
were more urgent matters. Now that the model is stable, we limit to present
again the possibility to change it as it wouldn't scramble all the model.
Actually, we are not even pushing for one interpretation or the other, but
thought was worth mentioning.

according to the formula in:
https://www.w3.org/community/ontolex/wiki/images/9/90/Formula_avgNumOfLexica
lizations-v1.png, the denominator comprises all the elements in the
ontology. Since we already have statistics about overall covered elements
(lime:percentage) we could consider to apply a different version of
avgNumOfLexicalizations, which is considering only the elements effectively
participating to at least a lexicalization. In this case, this value would
be more independent (and thus more descriptive) from the other.

E:g. I have a ontology O lexicalized by lexicon L, for only 10% of its
concepts. However, for those 10% concepts, the average number of
lexicalizations is 4. This means that the lexicon badly covers the ontology,
but to its extent, it really describes well the covered references. If we
considered the avg over all references (including the non lexicalized ones),
I would get a 0,4, which is not providing much. Getting 10% of percentage
and 4 of avgNumOfLexicalization much better represents the lexicalization.

 

3)      Name of percentage: this has been raised by John in his resuming
email as well. Actually, we called it initially coverage, but it had a
different structure. With the reification of the property using partitions
it was later changed to percentage. Since we changed again the structure,
maybe that coverage makes much more sense. Percentage was good in the
context of a reified object expressing the coverage, and the percentage was
limited to the mere number. Now it could make sense to get back to the
original name. 

 

4)      The formula in:
https://www.w3.org/community/ontolex/wiki/File:Percentage_formula.gif is a
bit confusing. The predicate lexicalizes(entity,entry) is never used
formally in the specification. In any case, probably
lexicalizes(entry,entity) would make more sense as usually, when giving
verbs as names of predicates, the action should go from the first argument
to the second one. Also, reference should be referenceDataset or ontology,
as used in other cases. If ok for you, we can change it.

 

5)      Definition of avgNumOfLinks: this property indicates the average
number of links to a concept for each ontology element in the reference
dataset.

 

6)      we don't link "to a concept", as it seems that in play we have a
single concept linked many times by the same reference. Could we restate as:
"this property indicates the average number of links to lexical concepts for
each ontology element in the reference dataset" ?

 

 

Cheers,

 

Armando and Manuel

 

 

Received on Wednesday, 2 September 2015 15:06:29 UTC