RE: A final set of issues with the specification

Hi all,

 

a few replies about 3 LIME points:

 

4. Lime defines a number of properties that are of the form "the number of links from X to Y divided by the total number of X" for example lime:avgNumOfLexicalizations is "the number of links from references to lexical entries divided by the total number of references". This can be put into a table as follows:

 


X/Y

References

Entries

Concepts


References

-

avgNumOfLexicalizations

avgNumOfLinks


Entries

percentage

-

avgAmbiguity


Concepts

?

avgSynonymy

-

 

The table reveals a few inconsistencies in that we have a missing property and the percentage property should perhaps be named something like avgPolysemy

 

 

There is a bit of confusion in the table above, as it considers only one dimension given by a single property associated to pairs of sets, and wrongly (if we interpreted it right) considers some properties as being one the “inverse” (not as of owl:inverseProperty, but providing inverted ratios) of the other, while they are all truly different.

                               

Manuel replied on 24 and of 25 July. Here we provide a summary of both replies and expand a bit more to give a complete overview. Sorry in advance for the length, but in the end it is pretty short considering the overall resume.

 

The various statistics have been defined considering that we have two sets A and B and a set Pairs of pairs (a,b) ∈ AxB. The bindings between these sets are sets themselves, and in the specific, we have:

*         LexicalizationSet binding elements from a Lexicon to Ontology references

*         ConceptualizationSet binding Lexical Concepts from a ConceptSet to entries in a Lexicon (i.e. providing a conceptual backbone for it)

*         LexicalLinkset  linking Lexical Concepts from a ConceptSet to  Ontology references

In general, for each combination of type of sets we have various integer counts:

*         the total number of pairs = |Pairs|

*         the a's that occur in at least one pair = |{a ∈ A | ∃ b ∈ B . (a,b) ∈ Pairs)}|

*         the b's that occur in at least one pair = |{b ∈ B | ∃ a ∈ A . (a,b) ∈ Pairs)}|

(we use the notation "|<set>|" to express the cardinality of each set, while in the spec we sometimes use the fragment #)

And then we have some "ratios", providing (we use here general names):

*         <A-coverage>: ratio of elements in A that participate in at least one pair (in other words, that have been associated with at least one b in B)

*         <avgNumOf_B-in-A> = average number of b in B associated with each a in A (XXX is Lexicalizations, Links, Conceptualizations)

for these rations we have chosen a "preferential direction" (not strict, just for naming here), say from A to B:

*         from the Ontology to the Lexicon in the case of a LexicalizationSet

*         from the Lexicon to the ConceptSet in the case of a ConceptualizationSet

*         from the Ontology to the ConceptSet in the case of a LexicalLinkset

That is, we keep on the left the one which is being “enriched” by elements from the set on the right. So we have only one verse for the property, which is the one that makes more sense given the kind of binding.

Given the above, this is the full table grounding the above general properties on the three specific Binding Sets:


<Binding Set>

Total number of pairs

Occurring “a”’s

Occurring “b”’s

A-coverage (at least one B in A)

avgNumOf_B-in-A

*! avgNumOf_A-in-B !*


LexicalizationSet <Ontology, Lexicon>

lime:lexicalizations

lime:references

lime:lexicalEntries

percentage

avgNumOfLexicalizations

---- N/A ----


ConceptualizationSet <Lexicon, ConceptSet>

lime:conceptualizations

lime:lexicalEntries

lime:concepts

---- N/A ----

avgAmbiguity

avgSynonymy


LexicalLinkset <Ontology, ConceptSet>

lime:links

lime:references

lime:concepts

percentage

avgNumOfLinks

---- N/A ----

Note: we have rechecked the current specification and all the properties are correctly assigned wrt what asserted in this table

 

So the table is pretty complete, we provide here explanations for the ---- N/A ---- entries:

1)      no equivalents for avgSynonymy: Initially, there was no intention of having a “avgNumOf_A-in-B” property. However, while lexicalizations and lexical_link_sets represents a-posteriori links between potentially pre-existing entities, conceptualization sets are often defined in the context of complete Lexical/Semantic Resources, such as WordNet. 
An example: if I use WordNet as a Lexicon to lexicalize a short ontology (case of a LexicalizationSet), what would be the purpose of telling that there are on average 0,0000001.. ontology references for each lexical entry in WordNet? 
Same for the if I use WordNet synsets to “tag” ontology references.
Addressing instead the level of synonymy in the ConceptualizationSet which is part of WordNet itself may make sense, that’s why this property has been introduced in one of our recent calls. 



2)      no A-coverage for ConceptualizationSet: as above, the idea (we mostly agreed upon) was that most ConceptualizationSets are defined inside already existing lexico-semantic resources. In this sense, the coverage would be always 1 as the conceptualizationSet is defined over a ConceptSet defined specifically for the Lexicon (thus covering all of it). Even if we consider cases of reuse of conceptsets (such as the set of synsets in WordNet) to create other lexicons (e.g. other language-specific WordNets), and the possible adding of further synsets to cover language-specific expressions, we still would have A-coverage = 1.



So far, so good. We have some additional comments, but to avoid confusion, better to keep this as an explanation for the question by John, and then we will address further issues in a another email.

 

34. The diagram for lime metadata needs to be updated. (JPM)

 

 

 

                               The diagram was updated by Manuel on the 20th of July. Maybe you needed to force a refresh

 

 

35. lime/example2 "jnp" => "jpn" (JPM)

 

                            

                               Ok we just changed them                       

 

 

 

                               Cheers,

 

                               Armando and Manuel

Received on Wednesday, 2 September 2015 14:24:04 UTC