Re: lime module from Philipp Cimiano on 2015-07-15 (public-ontolex@w3.org from July 2015)

From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Date: Wed, 15 Jul 2015 08:06:58 +0200
To: Manuel Fiorelli <manuel.fiorelli@gmail.com>
CC: "public-ontolex@w3.org" <public-ontolex@w3.org>
Message-ID: <55A5F882.6080104@cit-ec.uni-bielefeld.de>
Hi Manuel,

  replying to this, todos from last Friday....

Am 07.07.15 um 15:55 schrieb Manuel Fiorelli:
> Dear Philipp, All
>
> here are my preliminary comments. Most of them are minor typos, while 
> other may seed further discussion.
>
> -----
>
> In the introduction to example 1, the spec says:
>
> "As an example we may describe a simple lexicon using this property as 
> well as properties from Dublin Core and VoID: "
>
> The example then contains also the actual lexical entries that 
> constitute the lexicon. This is good for what concerns the 
> self-explanatory nature of the example. However, we should make clear 
> that in general the metadata only deals with the description of the 
> lexicon as a whole, while the representation of its actual content is 
> in the scope of other modules. This is particularly relevant to 
> "lexicon catalogs", which may only be interested in indexing lexicons 
> without the need to also host the actual content.
>
I kept the example as is but added a sentence that makes clear that the 
metadata describes the lexicon as a whole as suggested by you.

> -----
>
> In the definition of LexicalizationSet, the classes Lexicon and 
> Dataset need, respectively, the prefix ontolex and void.

Fixed
>
> -----
>
> I am not sure about this statement:
>
> "The lexicalization set object should be unique for a given 
> lexicon-ontology pair"
>
> Indeed, the statement above imply that there cannot be two different 
> lexicalization sets for FOAF using the WordNet RDF lexicon. I think 
> that this conclusion is false, so the previous statement should be 
> retracted.
>
This has been removed.

> -----
>
> In the definition of lexicalizationModel, the disjunction is spelled 
> OR, whereas in other cases it is spelled in lowercase.

has been fixed by you I guess, thanks.
>
> -----
>
> The definition of lime:references does not mention the fact that in a 
> lexical linkset an ontology reference can be associated with a lexical 
> concept.

In order to avoid overlading, I would prefer to keep "references" as 
referring to the distinct number of resources ?o, that is:

# of different ?o such that (?s,reference,?o)

>
> -----
>
> Concerning Example2:
> - we should add the language "ja" to the lexicalizationSet resource
> - we may say that the ontology is an instance of voaf:Vocabulary, 
> which is a subclass of void:Dataset to represent vocabularies (both 
> RDFS Schemas and OWL Ontologies)
> - I would extend the introduction to the example. This is my attempt:
>
> <cite>
> In the following example, we describe a lexicalization set expressing 
> how elements of an ontology can be verbalized in Japanese by means of 
> entries from a supplied lexicon. The metadata clearly tells which 
> ontology and lexicon are involved in the lexicalization sets, as well 
> as the relevant natural language. The knowledge of these facts about 
> the lexicalization set allows us to assess the usefulness of a 
> lexicalization set for a given task as well to discover relevant 
> lexicalization sets, when we are constrained by the choice of an 
> ontology, lexicon or natural language.
>
> We model the ontology as an instance of the class voaf:Vocabulary that 
> is a kind of void:Dataset representing vocabularies (bot RDFS Schemas 
> and OWL Ontologies). We benefit from the more specific distinctions 
> made by VOAF, by breaking down the total number of entities in the 
> ontology (held by the property void:entities) into separate counts for 
> the classes and properties (held by voaf:classNumber and 
> voaf:propertyNumber, respectively).
>
> Similarly, we use terms from the Lime vocabulary to represent 
> statistics about the linguistic content of the lexicon and the 
> lexicalization set. Overall, the ontology defines 80 entities and the 
> lexicon 100 lexical entries; however, only 20 entities from the target 
> ontologies have been associated with a total of 50 lexical entries.
> </cite>
>
> -----
Great, I have added your text to the example.

>
> In the definition of avgNumOfLexicalizations, it occurs the word 
> "define" while it should be "defines".

I can not find this, sorry.

But this brings me to another issues. The formula for 
avgNumOfLexicalizations could be improved to make it clearer as follows:

avgNumOfLexicalizations = # lexicalizations / # ontology entities in the 
reference dataset

What do you think? Can you possibly update the formula? That would be 
great. Thanks.

>
> -----
>
> I would postpone example 3 to end of the section, and I would modify 
> it as follows:
> - reuse the same data as in example 2, and make this clear in the 
> introduction to the example
> - then, use the properties lexicalizations, avgNumOfLexicalizations 
> and percentage to "analyze" the scenario depicted in example 2. For 
> instance, it is now possible to tell explicitly that only 25% of the 
> reference ontology has been lexicalized.
>
> We can make the example more interesting playing with polisemy so that 
> the ratios are not "obvious".

Actually, I think that example 3 makes definitely sense here. The ratios 
are rather obvious, true, but this is good as a simple and clear example.
>
> -----
>
> In the definition of LexicalLinkset, the class dataset needs the 
> prefix void.
>
> -----
>
OK, this has been fixed as far as I see.

> I would propose the following example for lime:ConceptualizationSet
>
> :WnConceptualizationSet a lime:ConceptualizationSet ;
>   lime:conceptualDataset :WnConceptSet ;
>   lime:lexiconDataset :WnLexicon ;
>   lime:lexicalEntries 155287 ;
>   lime:concepts 117659 ;
>   lime:conceptualizations 206941 ;
>   lime:avgPolisemy 1.33
>   .
>
> For the statistics, I referred to this page: 
> https://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html
>
> We should discuss whether and how:
>
>   * to represent monosemous words
>   * to break down the statistics with respect to different part of
>     speech tags
>
> Regards
>
> Manuel
>
>
> 2015-07-07 15:02 GMT+02:00 Philipp Cimiano 
> <cimiano@cit-ec.uni-bielefeld.de 
> <mailto:cimiano@cit-ec.uni-bielefeld.de>>:
>
>     Dear all,
>
>      I went through the lime module today, streamlining the
>     definitions etc. to make them more conformant to the rest of the
>     modules. I also updated the ontology. I will go through all
>     sections asking for comments on Friday.
>
>     Please send me any comments you deem important by Friday.
>
>     I still need to work through the examples both in the wiki and the
>     git repo. It seems to me that we need a few additional examples in
>     this section.
>
>     Kind regards,
>
>     Philipp.
>
>     -- 
>     --
>     Prof. Dr. Philipp Cimiano
>     AG Semantic Computing
>     Exzellenzcluster für Cognitive Interaction Technology (CITEC)
>     Universität Bielefeld
>
>     Tel: +49 521 106 12249 <tel:%2B49%20521%20106%2012249>
>     Fax: +49 521 106 6560 <tel:%2B49%20521%20106%206560>
>     Mail: cimiano@cit-ec.uni-bielefeld.de
>     <mailto:cimiano@cit-ec.uni-bielefeld.de>
>
>     Office CITEC-2.307
>     Universitätsstr. 21-25
>     33615 Bielefeld, NRW
>     Germany
>
>
>
>
>
> -- 
> Manuel Fiorelli

-- 
--
Prof. Dr. Philipp Cimiano
AG Semantic Computing
Exzellenzcluster für Cognitive Interaction Technology (CITEC)
Universität Bielefeld

Tel: +49 521 106 12249
Fax: +49 521 106 6560
Mail: cimiano@cit-ec.uni-bielefeld.de

Office CITEC-2.307
Universitätsstr. 21-25
33615 Bielefeld, NRW
Germany
Received on Wednesday, 15 July 2015 06:07:30 UTC