Re: lime module

Hi Manuel,

thanks, see below ...

Am 13.07.15 um 18:22 schrieb Manuel Fiorelli:
> Dear Philipp, All
>
> Following our discussion on the LIME module during the last telco, 
> here are some updates on the specification:
>
> https://www.w3.org/community/ontolex/wiki/index.php?title=Final_Model_Specification&diff=2289&oldid=2250
>
> The spec has been modified to address some of the issues I have raised 
> in previous emails (see details below within the quoted text).
>
> The diagram on Draw.io has been modified, considering the current 
> state of the Lime metadata vocabulary. Further modifications could be 
> required once you decided what to do with the properties to renamed or 
> split.
>
> Some examples were added to the end of the metadata module, but we 
> will revise them in the next days. We modified some definitions, but 
> others have not been modified because of the possibility they could be 
> split or renamed. Specifically, here are some definitions (or axioms) 
> to be modified:
>
> *lime:lexicalEntries*
>
> - The domain of this property should be Lexicon or LexicalizationSet 
> or Conceptualization and the definition should be changed accordingly, 
> unless we want to split this property into two or more properties.
>

I changed the property definition to also include ConceptualizationSet 
as domain.

You mean ConceptualizationSet, right?

> *lime:referenceDataset*
>
> - the definition should be reviewed
>

For me the definition is fine, what exactly should be reviewed?
>
>
> *lime:lexicalizationModel*
>
> - the domain should not include ontolex:Lexicon (this could be a 
> refuse remained after the introduction of lime:linguisticModel)
>
>
OK, fixed...

> *lime:references*
>
> - Not sure if this will be split or renamed
>
>
See my other email on this, I propose that for the sake of clarity and 
avoid overloading we keep this property as denoting the number of 
distinct ?o in triples (?s,reference,?o)

> *lime:percentages*
>
> - in the definition, we should add the mention to lexical linksets
>

I changed this as follows:

The '''percentage''' property expresses the percentage of entities in 
the reference dataset which have at least one lexicalization in a 
lexicalization set or are linked to a lexical concept in a lexical linkset.

Fine?

>
> *lime:partition*
>
> - the definition of partition is wrong, as it only refers to 
> lexicalization sets
>
>
> *lime:resourceType*
>
> - as before, it only mentions lexicalization sets
>
OK, thanks. I changed the definitions. Are they fine now?

> *lime:concepts*
>
> the introduction to the definition of lime:concepts firstly mention 
> its use in a concept set, although we are in the section about 
> lexicalLinkset
>
>
OK, I introduced a pointer to the definition of ConceptSet in ontolex.

Fine?

> *lime:avgNumOfLinks*
>
> - the definition is wrong. This property should give the average 
> number of links per ontology entity
>
I changed the definition to:

The '''average number of links''' property indicates the average number 
of links to a concept for each ontology element in the reference dataset.


>
>
> 2015-07-10 15:27 GMT+02:00 Manuel Fiorelli <manuel.fiorelli@gmail.com 
> <mailto:manuel.fiorelli@gmail.com>>:
>
>
>     *Section "lexicon metadata"*
>
>     Just before the definition box of /linguistic model/:
>
>     "We may also specify the linguistic (annotation model) used in a
>     lexicon with the linguistic model property"
>
>     I think that the word "model" should go outside the parenthesis.
>     Additionally, I would make it clearer that we are talking about
>     things such as part of speech, number, gender, and so... maybe
>     also by pointing to the section of the specification where we
>     wrote explicitly that.
>
>
> DONE
>
>
>     *Section "Lexicalization Set"*
>
>     "In RDF, a lexicalization is expressed via the property rdfs:label."
>
>     It should be "In RDFS" (note I added an S).
>
>
> DONE
>
>
>     *Section "Partitions"*
>
>     "many cases, we want to provide descriptive metadata about a
>     subset of a lexicallization"
>
>     it should be "of a lexicalization set"
>
>
> Still TODO. Actually, the paragraph and the definition of the property 
> should be extended as well to incorporate both lexical linkset and 
> lexicalization sets.
>
>
>     *Section "Publication Strategies"*
>
>
>     "For example, this allows lexicalizing lexical concepts from an
>     existing wordnet in a different natural language than the one for
>     which the resource was initially conceived"
>
>     I am unsure that it is appropriate to use the word "lexicalizing"
>     in association with lexical concepts, because we insisted that the
>     nature of a "conceptualization set" is different from that of a
>     "lexicalization set"
>
>
> DONE during the telco.
>
>
>     2015-07-07 15:55 GMT+02:00 Manuel Fiorelli
>     <manuel.fiorelli@gmail.com <mailto:manuel.fiorelli@gmail.com>>:
>
>         Dear Philipp, All
>
>         here are my preliminary comments. Most of them are minor
>         typos, while other may seed further discussion.
>
>         -----
>
>         In the introduction to example 1, the spec says:
>
>         "As an example we may describe a simple lexicon using this
>         property as well as properties from Dublin Core and VoID: "
>
>         The example then contains also the actual lexical entries that
>         constitute the lexicon. This is good for what concerns the
>         self-explanatory nature of the example. However, we should
>         make clear that in general the metadata only deals with the
>         description of the lexicon as a whole, while the
>         representation of its actual content is in the scope of other
>         modules. This is particularly relevant to "lexicon catalogs",
>         which may only be interested in indexing lexicons without the
>         need to also host the actual content.
>
>
> WON'T FIX. We decided that the example is fine, and it may be the case 
> that further examples (only concerning with metadata) should be added 
> later in the spec.
>
>         -----
>
>         In the definition of LexicalizationSet, the classes Lexicon
>         and Dataset need, respectively, the prefix ontolex and void.
>
>         -----
>
>
> DONE.
>
>
>         I am not sure about this statement:
>
>         "The lexicalization set object should be unique for a given
>         lexicon-ontology pair"
>
>         Indeed, the statement above imply that there cannot be two
>         different lexicalization sets for FOAF using the WordNet RDF
>         lexicon. I think that this conclusion is false, so the
>         previous statement should be retracted.
>
>
> TODO. I think that we agreed on removing that sentence, but I leave 
> the honor to the editors.
>
>         -----
>
>         In the definition of lexicalizationModel, the disjunction is
>         spelled OR, whereas in other cases it is spelled in lowercase.
>
>         -----
>
>
> TODO. Actually, the misspelling has been corrected, but I think that 
> we should remove ontolex:Lexicon entirely, because that property only 
> applied to lexicalization sets. Concerning lexicons, we should use the 
> related property lime:linguisticModel.
>
>
>         The definition of lime:references does not mention the fact
>         that in a lexical linkset an ontology reference can be
>         associated with a lexical concept.
>
>
> TODO. Actually, you mentioned the possibility that properties such as 
> references and concepts could be split.
>
>         -----
>
>         Concerning Example2:
>         - we should add the language "ja" to the lexicalizationSet
>         resource
>         - we may say that the ontology is an instance of
>         voaf:Vocabulary, which is a subclass of void:Dataset to
>         represent vocabularies (both RDFS Schemas and OWL Ontologies)
>
>
> DONE the addition of the language to the lexicalization set as well 
> the addition of the lexicalization model.
>
>         - I would extend the introduction to the example. This is my
>         attempt:
>
>         <cite>
>         In the following example, we describe a lexicalization set
>         expressing how elements of an ontology can be verbalized in
>         Japanese by means of entries from a supplied lexicon. The
>         metadata clearly tells which ontology and lexicon are involved
>         in the lexicalization sets, as well as the relevant natural
>         language. The knowledge of these facts about the
>         lexicalization set allows us to assess the usefulness of a
>         lexicalization set for a given task as well to discover
>         relevant lexicalization sets, when we are constrained by the
>         choice of an ontology, lexicon or natural language.
>
>         We model the ontology as an instance of the class
>         voaf:Vocabulary that is a kind of void:Dataset representing
>         vocabularies (bot RDFS Schemas and OWL Ontologies). We benefit
>         from the more specific distinctions made by VOAF, by breaking
>         down the total number of entities in the ontology (held by the
>         property void:entities) into separate counts for the classes
>         and properties (held by voaf:classNumber and
>         voaf:propertyNumber, respectively).
>
>         Similarly, we use terms from the Lime vocabulary to represent
>         statistics about the linguistic content of the lexicon and the
>         lexicalization set. Overall, the ontology defines 80 entities
>         and the lexicon 100 lexical entries; however, only 20 entities
>         from the target ontologies have been associated with a total
>         of 50 lexical entries.
>         </cite>
>
>
> TODO. I prefer that this addition is applied by the editors.
>
>         -----
>
>         In the definition of avgNumOfLexicalizations, it occurs the
>         word "define" while it should be "defines".
>
>
> DONE
>
>         -----
>
>         I would postpone example 3 to end of the section, and I would
>         modify it as follows:
>         - reuse the same data as in example 2, and make this clear in
>         the introduction to the example
>         - then, use the properties lexicalizations,
>         avgNumOfLexicalizations and percentage to "analyze" the
>         scenario depicted in example 2. For instance, it is now
>         possible to tell explicitly that only 25% of the reference
>         ontology has been lexicalized.
>
>         We can make the example more interesting playing with polisemy
>         so that the ratios are not "obvious".
>
>
> TODO
>
>         -----
>
>         In the definition of LexicalLinkset, the class dataset needs
>         the prefix void.
>
>
> DONE
>
>         -----
>
>         I would propose the following example for
>         lime:ConceptualizationSet
>
>         :WnConceptualizationSet a lime:ConceptualizationSet ;
>           lime:conceptualDataset :WnConceptSet ;
>           lime:lexiconDataset :WnLexicon ;
>           lime:lexicalEntries 155287 ;
>           lime:concepts 117659 ;
>           lime:conceptualizations 206941 ;
>           lime:avgPolisemy 1.33
>           .
>
>         For the statistics, I referred to this page:
>         https://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html
>
>
> DONE
>
>         We should discuss whether and how:
>
>           * to represent monosemous words
>           * to break down the statistics with respect to different
>             part of speech tags
>
>
> WON'T DO. Actually, we thought that this use case could be supported 
> by partitions of the conceptualization sets, but various technical 
> difficulties made us desist :-D
>
>
> Regards,
>
> Manuel Fiorelli

-- 
--
Prof. Dr. Philipp Cimiano
AG Semantic Computing
Exzellenzcluster für Cognitive Interaction Technology (CITEC)
Universität Bielefeld

Tel: +49 521 106 12249
Fax: +49 521 106 6560
Mail: cimiano@cit-ec.uni-bielefeld.de

Office CITEC-2.307
Universitätsstr. 21-25
33615 Bielefeld, NRW
Germany

Received on Wednesday, 15 July 2015 06:29:37 UTC