- From: Armando Stellato <stellato@info.uniroma2.it>
- Date: Wed, 2 Sep 2015 17:01:45 +0200
- To: "'public-ontolex'" <public-ontolex@w3.org>
- Message-ID: <DUB408-EAS167BF2066FAF0BF7002A02AA0690@phx.gbl>
Dear all, this email follows the one in reply to John's "A final set of issues with the specification". We kept separate our clarification from some further comments that we put in this email. 1) Name of avgAmbiguity: in the second-last call, John/Philipp asked to change avgPolisemy to avgAmbiguity, providing a more general name which would embrace also the case of Homonimy. We agreed (and still agree, no worries ;-) ). But now, we have a doubt about how to compute this: in Lemon, homonymous entries are handled as..different entries. So now, if we want to represent this very general ambiguity, which would be the glueing element in the model? E.g. suppose to have these 3 (simplified here) definitions for the word bank: bank-s-1: financial institution bank-s-2: bank as a the building of the above bank-s-1 bank-s-3: the bank of a river now, bank-s-1 and bank-s-2 would be bound to the same lexical entry (as the word is polysemous in bank-s-1 and bank-s-2), let's say bank-w-1, while bank-s-3, if we are not wrong, would be bound to a different lexical entry, bank-w-2, which is homonymous wrt the previous one. if we want to represent the avgAMbiguity, we need to compute the ambiguity of each single entry. Now, let's think about the ambiguity of "banks", how should this be computed wrt the available entries? Intended current intention of our concept of ambiguity: ambiguity("bank") = 3 but there is no single entry "bank" in the lexicon! (which, indeed, would be necessary, unless we demand to indexing systems this aspect, and totally forget this in our model, which would be bad) so, wrt available entries, we should have: ambiguity(bank-w-1) = 2 ambiguity(bank-w-2) = 1 which: a. is odd, as it really manages the two banks as two different things, and provides no indication about the fact that one might still find a large ambiguity if searching the word "bank" ('cause if I search for "bank" I don't care about much theory and differences among polysemy and homonymy, I just want to know that there are 3 results) b. in the end.I'm getting the polysemy, because the homonymy is shadowed by the distinct entries. obviously, our intent would be to keep the overall ambiguity, so this is not a request to revert back to the name avgPolysemy, but a question on how this is reflected in the core model 2) interpretation for avgNumOfLexicalizations: we just return on something which has never been discussed (but only presented by us) as there were more urgent matters. Now that the model is stable, we limit to present again the possibility to change it as it wouldn't scramble all the model. Actually, we are not even pushing for one interpretation or the other, but thought was worth mentioning. according to the formula in: https://www.w3.org/community/ontolex/wiki/images/9/90/Formula_avgNumOfLexica lizations-v1.png, the denominator comprises all the elements in the ontology. Since we already have statistics about overall covered elements (lime:percentage) we could consider to apply a different version of avgNumOfLexicalizations, which is considering only the elements effectively participating to at least a lexicalization. In this case, this value would be more independent (and thus more descriptive) from the other. E:g. I have a ontology O lexicalized by lexicon L, for only 10% of its concepts. However, for those 10% concepts, the average number of lexicalizations is 4. This means that the lexicon badly covers the ontology, but to its extent, it really describes well the covered references. If we considered the avg over all references (including the non lexicalized ones), I would get a 0,4, which is not providing much. Getting 10% of percentage and 4 of avgNumOfLexicalization much better represents the lexicalization. 3) Name of percentage: this has been raised by John in his resuming email as well. Actually, we called it initially coverage, but it had a different structure. With the reification of the property using partitions it was later changed to percentage. Since we changed again the structure, maybe that coverage makes much more sense. Percentage was good in the context of a reified object expressing the coverage, and the percentage was limited to the mere number. Now it could make sense to get back to the original name. 4) The formula in: https://www.w3.org/community/ontolex/wiki/File:Percentage_formula.gif is a bit confusing. The predicate lexicalizes(entity,entry) is never used formally in the specification. In any case, probably lexicalizes(entry,entity) would make more sense as usually, when giving verbs as names of predicates, the action should go from the first argument to the second one. Also, reference should be referenceDataset or ontology, as used in other cases. If ok for you, we can change it. 5) Definition of avgNumOfLinks: this property indicates the average number of links to a concept for each ontology element in the reference dataset. 6) we don't link "to a concept", as it seems that in play we have a single concept linked many times by the same reference. Could we restate as: "this property indicates the average number of links to lexical concepts for each ontology element in the reference dataset" ? Cheers, Armando and Manuel
Received on Wednesday, 2 September 2015 15:06:29 UTC