Re: A final set of issues with the specification from John McCrae on 2015-09-04 (public-ontolex@w3.org from September 2015)

From: John McCrae <johnmccrae@gmail.com>
Date: Fri, 4 Sep 2015 10:41:00 +0100
To: Armando Stellato <stellato@info.uniroma2.it>
Cc: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>, public-ontolex <public-ontolex@w3.org>
Message-ID: <CAC5njqoPAmw7G1CmaDmzFEhLReVBWnbeudshTMrRYy+21Hmn3A@mail.gmail.com>
On Wed, Sep 2, 2015 at 3:19 PM, Armando Stellato <stellato@info.uniroma2.it>
wrote:

> Hi all,
>
>
>
> a few replies about 3 LIME points:
>
>
>
> 4. Lime defines a number of properties that are of the form "the number of
> links from X to Y divided by the total number of X" for example
> lime:avgNumOfLexicalizations is "the number of links from references to
> lexical entries divided by the total number of references". This can be
> put into a table as follows:
>
>
>
> *X/Y*
>
> *References*
>
> *Entries*
>
> *Concepts*
>
> References
>
> -
>
> avgNumOfLexicalizations
>
> avgNumOfLinks
>
> Entries
>
> percentage
>
> -
>
> avgAmbiguity
>
> Concepts
>
> ?
>
> avgSynonymy
>
> -
>
>
>
> The table reveals a few inconsistencies in that we have a missing property
> and the percentage property should perhaps be named something like
> avgPolysemy
>
>
>
>
>
> There is a bit of confusion in the table above, as it considers only one
> dimension given by a single property associated to pairs of sets, and
> wrongly (if we interpreted it right) considers some properties as being one
> the “inverse” (not as of owl:inverseProperty, but providing inverted
> ratios) of the other, while they are all truly different.
>
>
>
> Manuel replied on 24 and of 25 July. Here we provide a summary of both
> replies and expand a bit more to give a complete overview. Sorry in advance
> for the length, but in the end it is pretty short considering the overall
> resume.
>
>
>
> The various statistics have been defined considering that we have two sets
> *A* and *B* and a set *Pairs* of pairs (a,b) ∈ AxB. The bindings between
> these sets are sets themselves, and in the specific, we have:
>
> ·         LexicalizationSet binding elements from a Lexicon to Ontology
> references
>
> ·         ConceptualizationSet binding Lexical Concepts from a ConceptSet
> to entries in a Lexicon (i.e. providing a conceptual backbone for it)
>
> ·         LexicalLinkset  linking Lexical Concepts from a ConceptSet to
> Ontology references
>
> In general, for each combination of type of sets we have various integer
> counts:
>
> ·         the total number of pairs = |Pairs|
>
> ·         the a's that occur in at least one pair = |{a ∈ A | ∃ b ∈ B . (a,b)
> ∈ Pairs)}|
>
> ·         the b's that occur in at least one pair = |{b ∈ B | ∃ a ∈ A . (a,b)
> ∈ Pairs)}|
>
> (we use the notation "|<set>|" to express the cardinality of each set,
> while in the spec we sometimes use the fragment #)
>
> And then we have some "ratios", providing (we use here general names):
>
> ·         <A-coverage>: ratio of elements in A that participate in at
> least one pair (in other words, that have been associated with at least one
> b in B)
>
> ·         <avgNumOf_B-in-A> = average number of b in B associated with
> each a in A (XXX is Lexicalizations, Links, Conceptualizations)
>
> for these rations we have chosen a "preferential direction" (not strict,
> just for naming here), say from A to B:
>
> ·         from the Ontology to the Lexicon in the case of a
> LexicalizationSet
>
> ·         from the Lexicon to the ConceptSet in the case of a
> ConceptualizationSet
>
> ·         from the Ontology to the ConceptSet in the case of a
> LexicalLinkset
>
> That is, we keep on the left the one which is being “enriched” by elements
> from the set on the right. So we have only one verse for the property,
> which is the one that makes more sense given the kind of binding.
>
> Given the above, this is the full table grounding the above general
> properties on the three specific Binding Sets:
>
> *<Binding Set>*
>
> *Total number of pairs*
>
> *Occurring “a”’s*
>
> *Occurring “b”’s*
>
> *A-coverage (at least one B in A)*
>
> *avgNumOf_B-in-A*
>
> **! avgNumOf_A-in-B !**
>
> LexicalizationSet <Ontology, Lexicon>
>
> lime:lexicalizations
>
> lime:references
>
> lime:lexicalEntries
>
> percentage
>
> avgNumOfLexicalizations
>
> ---- N/A ----
>
> ConceptualizationSet <Lexicon, ConceptSet>
>
> lime:conceptualizations
>
> lime:lexicalEntries
>
> lime:concepts
>
> ---- N/A ----
>
> avgAmbiguity
>
> avgSynonymy
>
> LexicalLinkset <Ontology, ConceptSet>
>
> lime:links
>
> lime:references
>
> lime:concepts
>
> percentage
>
> avgNumOfLinks
>
> ---- N/A ----
>
> Note: we have rechecked the current specification and all the properties
> are correctly assigned wrt what asserted in this table
>
>
>
> So the table is pretty complete, we provide here explanations for the ----
> N/A ---- entries:
>
> 1)      *no equivalents for avgSynonymy*: Initially, there was no
> intention of having a “avgNumOf_A-in-B” property. However, while
> lexicalizations and lexical_link_sets represents a-posteriori links between
> potentially pre-existing entities, conceptualization sets are often defined
> in the context of complete Lexical/Semantic Resources, such as WordNet.
> An example: if I use WordNet as a Lexicon to lexicalize a short ontology
> (case of a LexicalizationSet), what would be the purpose of telling that
> there are on average 0,0000001.. ontology references for each lexical entry
> in WordNet?
> Same for the if I use WordNet synsets to “tag” ontology references.
> Addressing instead the level of synonymy in the ConceptualizationSet which
> is part of WordNet itself may make sense, that’s why this property has been
> introduced in one of our recent calls.
>
> 2)      *no A-coverage for ConceptualizationSet*: as above, the idea (we
> mostly agreed upon) was that most ConceptualizationSets are defined inside
> already existing lexico-semantic resources. In this sense, the coverage
> would be always 1 as the conceptualizationSet is defined over a ConceptSet
> defined specifically for the Lexicon (thus covering all of it). Even if we
> consider cases of reuse of conceptsets (such as the set of synsets in
> WordNet) to create other lexicons (e.g. other language-specific WordNets),
> and the possible adding of further synsets to cover language-specific
> expressions, we still would have A-coverage = 1.
>
> So far, so good. We have some additional comments, but to avoid confusion,
> better to keep this as an explanation for the question by John, and then we
> will address further issues in a another email.
>
Hmmm... for future proofing wouldn't it be better just to have this table
complete for all slots?

Also this table must go into the Final Spec.

>
>
> 34. The diagram for lime metadata needs to be updated. (JPM)
>
>
>
>
>
>
>
>                                The diagram was updated by Manuel on the 20
> th of July. Maybe you needed to force a refresh
>
Manuel> Can you send me the source of this diagram, it seems to have some
stylistic differences to the rest of the diagrams?

Regards,
John

>
>
>
>
> 35. lime/example2 "jnp" => "jpn" (JPM)
>
>
>
>
>
>                                Ok we just changed them
>
>
>
>
>
>
>
>
>                                Cheers,
>
>
>
>                                Armando and Manuel
>
Received on Friday, 4 September 2015 09:41:29 UTC