- From: John McCrae <johnmccrae@gmail.com>
- Date: Fri, 4 Sep 2015 10:41:00 +0100
- To: Armando Stellato <stellato@info.uniroma2.it>
- Cc: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>, public-ontolex <public-ontolex@w3.org>
- Message-ID: <CAC5njqoPAmw7G1CmaDmzFEhLReVBWnbeudshTMrRYy+21Hmn3A@mail.gmail.com>
On Wed, Sep 2, 2015 at 3:19 PM, Armando Stellato <stellato@info.uniroma2.it> wrote: > Hi all, > > > > a few replies about 3 LIME points: > > > > 4. Lime defines a number of properties that are of the form "the number of > links from X to Y divided by the total number of X" for example > lime:avgNumOfLexicalizations is "the number of links from references to > lexical entries divided by the total number of references". This can be > put into a table as follows: > > > > *X/Y* > > *References* > > *Entries* > > *Concepts* > > References > > - > > avgNumOfLexicalizations > > avgNumOfLinks > > Entries > > percentage > > - > > avgAmbiguity > > Concepts > > ? > > avgSynonymy > > - > > > > The table reveals a few inconsistencies in that we have a missing property > and the percentage property should perhaps be named something like > avgPolysemy > > > > > > There is a bit of confusion in the table above, as it considers only one > dimension given by a single property associated to pairs of sets, and > wrongly (if we interpreted it right) considers some properties as being one > the “inverse” (not as of owl:inverseProperty, but providing inverted > ratios) of the other, while they are all truly different. > > > > Manuel replied on 24 and of 25 July. Here we provide a summary of both > replies and expand a bit more to give a complete overview. Sorry in advance > for the length, but in the end it is pretty short considering the overall > resume. > > > > The various statistics have been defined considering that we have two sets > *A* and *B* and a set *Pairs* of pairs (a,b) ∈ AxB. The bindings between > these sets are sets themselves, and in the specific, we have: > > · LexicalizationSet binding elements from a Lexicon to Ontology > references > > · ConceptualizationSet binding Lexical Concepts from a ConceptSet > to entries in a Lexicon (i.e. providing a conceptual backbone for it) > > · LexicalLinkset linking Lexical Concepts from a ConceptSet to > Ontology references > > In general, for each combination of type of sets we have various integer > counts: > > · the total number of pairs = |Pairs| > > · the a's that occur in at least one pair = |{a ∈ A | ∃ b ∈ B . (a,b) > ∈ Pairs)}| > > · the b's that occur in at least one pair = |{b ∈ B | ∃ a ∈ A . (a,b) > ∈ Pairs)}| > > (we use the notation "|<set>|" to express the cardinality of each set, > while in the spec we sometimes use the fragment #) > > And then we have some "ratios", providing (we use here general names): > > · <A-coverage>: ratio of elements in A that participate in at > least one pair (in other words, that have been associated with at least one > b in B) > > · <avgNumOf_B-in-A> = average number of b in B associated with > each a in A (XXX is Lexicalizations, Links, Conceptualizations) > > for these rations we have chosen a "preferential direction" (not strict, > just for naming here), say from A to B: > > · from the Ontology to the Lexicon in the case of a > LexicalizationSet > > · from the Lexicon to the ConceptSet in the case of a > ConceptualizationSet > > · from the Ontology to the ConceptSet in the case of a > LexicalLinkset > > That is, we keep on the left the one which is being “enriched” by elements > from the set on the right. So we have only one verse for the property, > which is the one that makes more sense given the kind of binding. > > Given the above, this is the full table grounding the above general > properties on the three specific Binding Sets: > > *<Binding Set>* > > *Total number of pairs* > > *Occurring “a”’s* > > *Occurring “b”’s* > > *A-coverage (at least one B in A)* > > *avgNumOf_B-in-A* > > **! avgNumOf_A-in-B !** > > LexicalizationSet <Ontology, Lexicon> > > lime:lexicalizations > > lime:references > > lime:lexicalEntries > > percentage > > avgNumOfLexicalizations > > ---- N/A ---- > > ConceptualizationSet <Lexicon, ConceptSet> > > lime:conceptualizations > > lime:lexicalEntries > > lime:concepts > > ---- N/A ---- > > avgAmbiguity > > avgSynonymy > > LexicalLinkset <Ontology, ConceptSet> > > lime:links > > lime:references > > lime:concepts > > percentage > > avgNumOfLinks > > ---- N/A ---- > > Note: we have rechecked the current specification and all the properties > are correctly assigned wrt what asserted in this table > > > > So the table is pretty complete, we provide here explanations for the ---- > N/A ---- entries: > > 1) *no equivalents for avgSynonymy*: Initially, there was no > intention of having a “avgNumOf_A-in-B” property. However, while > lexicalizations and lexical_link_sets represents a-posteriori links between > potentially pre-existing entities, conceptualization sets are often defined > in the context of complete Lexical/Semantic Resources, such as WordNet. > An example: if I use WordNet as a Lexicon to lexicalize a short ontology > (case of a LexicalizationSet), what would be the purpose of telling that > there are on average 0,0000001.. ontology references for each lexical entry > in WordNet? > Same for the if I use WordNet synsets to “tag” ontology references. > Addressing instead the level of synonymy in the ConceptualizationSet which > is part of WordNet itself may make sense, that’s why this property has been > introduced in one of our recent calls. > > 2) *no A-coverage for ConceptualizationSet*: as above, the idea (we > mostly agreed upon) was that most ConceptualizationSets are defined inside > already existing lexico-semantic resources. In this sense, the coverage > would be always 1 as the conceptualizationSet is defined over a ConceptSet > defined specifically for the Lexicon (thus covering all of it). Even if we > consider cases of reuse of conceptsets (such as the set of synsets in > WordNet) to create other lexicons (e.g. other language-specific WordNets), > and the possible adding of further synsets to cover language-specific > expressions, we still would have A-coverage = 1. > > So far, so good. We have some additional comments, but to avoid confusion, > better to keep this as an explanation for the question by John, and then we > will address further issues in a another email. > Hmmm... for future proofing wouldn't it be better just to have this table complete for all slots? Also this table must go into the Final Spec. > > > 34. The diagram for lime metadata needs to be updated. (JPM) > > > > > > > > The diagram was updated by Manuel on the 20 > th of July. Maybe you needed to force a refresh > Manuel> Can you send me the source of this diagram, it seems to have some stylistic differences to the rest of the diagrams? Regards, John > > > > > 35. lime/example2 "jnp" => "jpn" (JPM) > > > > > > Ok we just changed them > > > > > > > > > Cheers, > > > > Armando and Manuel >
Received on Friday, 4 September 2015 09:41:29 UTC