Re: A final set of issues with the specification

Hi John, All

thank you for your precious review of the specification.

Let me start addressing some of the "more controversial" points you have
raised.

2015-07-24 13:37 GMT+02:00 John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>:

> 1. We do not given the abbreviation of "lexicon model for ontologies" as
> "lemon" although the term lemon is used at several points in the document.
> Do we agree that the model is called "lexicon model for ontologies" and
> abbreviated as "OntoLex-Lemon"?
>

I like the name *Lemon*, so I am inclined to agree with this name. Not sure
if the hyphen is required, tough.


> 4. Lime defines a number of properties that are of the form "the number of
> links from X to Y divided by the total number of X" for example
> lime:avgNumOfLexicalizations is "the number of links from references to
> lexical entries divided by the total number of references". This can be put
> into a table as follows:
>
> X/YReferencesEntriesConceptsReferences-avgNumOfLexicalizations
> avgNumOfLinksEntriespercentage-avgAmbiguityConcepts?avgSynonymy-
>
>
> The table reveals a few inconsistencies in that we have a missing property
> and the percentage property should perhaps be named something like
> avgPolysemy
>

The various statistics have been defined considering that we have two sets
*A* and *B* and a set *Pairs* of pairs (a,b) ∈ AxB.

We have various integer counts for:

   - the total number of pairs = |Pair|
   - the a's that occur in at least one pair = |{a ∈ A | ∃ b ∈ B . (a,b) ∈
   Pairs)}|
   - the b's that occur in at least one pair = |{b ∈ B | ∃ a ∈ A . (a,b) ∈
   Pairs)}|

(I used the symbol "|" to express the cardinality of each set, while in the
spec we sometimes use the fragment #)

For the "ratios", we have chosen a "preferential direction" (maybe this not
the right expression, or even the directions might be expressed in the
opposite manner), say from A to B:

   - from the Ontology to the Lexicon in the case of a LexicalizationSet
   - from the Ontology to the ConceptSet in the case of a LexicalLinkset
   - from the Lexicon to the ConceptSet in the case of a
   ConceptualizationSet

Given these viewpoints, we gave the following ratios:

   - percentage = ratio of elements in A that participate in at least one
   pair (in other words, that have been associated with at least one b in B)
   - avgNumOfXXX = average number of b in B associated with each a in A
   (XXX is Lexicalizations, Links, Conceptualizations)

For the ConceptualizationSet we followed a slightly different approach:

   - dropped percentage;
   - renamed avgNumOfConceptualizations into avgAmbiguity
   - and added,  avgSynonymy, which plays the role of avgNumOfXXX if we
   assume the opposite point of view (i.e. counting how many lexical entries
   are associated with each lexical concept)

Answering your questions:

   - percentage is not the same as avgPolisemy, avgAmbiguity, avgSynonymy
   - except for ConceptualizationSet, we need the ratios in the opposite
   direction that the one we assumed.
   - in fact, we could also consider the addition of a property analogous
   to percentage giving the ratios of participants in B

The problem with the introduction of avgNumOfXXX in the opposite direction
is that the current properties avgNumOfLexicalizations and avgNumOfLinks are
in fact ambiguous and their interpretation has been arbitrarily fixed by
assuming at the denominator the ontology entities. Therefore, I suspect
that the introduction of the missing properties would force us to change
the names of the already existing ones: it is not a case, to me, that in
end for the conceptualization set we decided to use avgPolisemy and
avgSynonymy, dropping avgConceptualizations altogether.

I really like avgPolisemy and avgSynonymy, which could be applied as well
to LexicalizationSets, but I think they cannot be applied to LexicalLinkset
(or at least, their interpretation could not be immediately clear, because
we are relating two "semantic resources")

In general, I remember that we agreed not to use the term "*polisemy*"
because it has a precise meaning in linguistics, and we don't want to deal
at this level with the issue polysemy/homonymy.

Now let me address some of the "not-so-important points":



> 23. Some examples use "dbonto" and some "dbpedia"... inconsistent. (JPM)
>

In DBpedia there are (at least) two namespaces that should be associated
with two distinct prefixes:

http://dbpedia.org/resource/ --> eg. http://dbpedia.org/resource/Rome

http://dbpedia.org/ontology/ --> eg. http://dbpedia.org/ontology/birthPlace

DBpedia uses the prefixes *dbpedia* and *dbpedia-owl*, respectively (though
I don't completely like the latter).

If have not verified if the specification uses resources from both
namespaces (thus two prefixes are necessary) or if it only uses resources
from one namespace (thus a single prefix should be used).

-- 
Manuel Fiorelli

Received on Friday, 24 July 2015 15:24:36 UTC