Re: lime module from Manuel Fiorelli on 2015-07-07 (public-ontolex@w3.org from July 2015)

From: Manuel Fiorelli <manuel.fiorelli@gmail.com>
Date: Tue, 7 Jul 2015 15:55:50 +0200
To: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Cc: "public-ontolex@w3.org" <public-ontolex@w3.org>
Message-ID: <CAGDmdGjc0vN5s=Aa4HrL0vp34+Go-DNKDW0o2-NVzXvmTY00Vw@mail.gmail.com>
Dear Philipp, All

here are my preliminary comments. Most of them are minor typos, while other
may seed further discussion.

-----

In the introduction to example 1, the spec says:

"As an example we may describe a simple lexicon using this property as well
as properties from Dublin Core and VoID: "

The example then contains also the actual lexical entries that constitute
the lexicon. This is good for what concerns the self-explanatory nature of
the example. However, we should make clear that in general the metadata
only deals with the description of the lexicon as a whole, while the
representation of its actual content is in the scope of other modules. This
is particularly relevant to "lexicon catalogs", which may only be
interested in indexing lexicons without the need to also host the actual
content.

-----

In the definition of LexicalizationSet, the classes Lexicon and Dataset need,
respectively, the prefix ontolex and void.

-----

I am not sure about this statement:

"The lexicalization set object should be unique for a given
lexicon-ontology pair"

Indeed, the statement above imply that there cannot be two different
lexicalization sets for FOAF using the WordNet RDF lexicon. I think that
this conclusion is false, so the previous statement should be retracted.

-----

In the definition of lexicalizationModel, the disjunction is spelled OR,
whereas in other cases it is spelled in lowercase.

-----

The definition of lime:references does not mention the fact that in a
lexical linkset an ontology reference can be associated with a lexical
concept.

-----

Concerning Example2:
- we should add the language "ja" to the lexicalizationSet resource
- we may say that the ontology is an instance of voaf:Vocabulary, which is
a subclass of void:Dataset to represent vocabularies (both RDFS Schemas and
OWL Ontologies)
- I would extend the introduction to the example. This is my attempt:

<cite>
In the following example, we describe a lexicalization set expressing how
elements of an ontology can be verbalized in Japanese by means of entries
from a supplied lexicon. The metadata clearly tells which ontology and
lexicon are involved in the lexicalization sets, as well as the relevant
natural language. The knowledge of these facts about the lexicalization set
allows us to assess the usefulness of a lexicalization set for a given task
as well to discover relevant lexicalization sets, when we are constrained
by the choice of an ontology, lexicon or natural language.

We model the ontology as an instance of the class voaf:Vocabulary that is a
kind of void:Dataset representing vocabularies (bot RDFS Schemas and OWL
Ontologies). We benefit from the more specific distinctions made by VOAF,
by breaking down the total number of entities in the ontology (held by the
property void:entities) into separate counts for the classes and properties
(held by voaf:classNumber and voaf:propertyNumber, respectively).

Similarly, we use terms from the Lime vocabulary to represent statistics
about the linguistic content of the lexicon and the lexicalization set.
Overall, the ontology defines 80 entities and the lexicon 100 lexical
entries; however, only 20 entities from the target ontologies have been
associated with a total of 50 lexical entries.
</cite>

-----

In the definition of avgNumOfLexicalizations, it occurs the word "define"
while it should be "defines".

-----

I would postpone example 3 to end of the section, and I would modify it as
follows:
- reuse the same data as in example 2, and make this clear in the
introduction to the example
- then, use the properties lexicalizations, avgNumOfLexicalizations
and percentage
to "analyze" the scenario depicted in example 2. For instance, it is now
possible to tell explicitly that only 25% of the reference ontology has
been lexicalized.

We can make the example more interesting playing with polisemy so that the
ratios are not "obvious".

-----

In the definition of LexicalLinkset, the class dataset needs the prefix void
.

-----

I would propose the following example for lime:ConceptualizationSet

:WnConceptualizationSet a lime:ConceptualizationSet ;
  lime:conceptualDataset :WnConceptSet ;
  lime:lexiconDataset :WnLexicon ;
  lime:lexicalEntries 155287 ;
  lime:concepts 117659 ;
  lime:conceptualizations 206941 ;
  lime:avgPolisemy 1.33
  .

For the statistics, I referred to this page:
https://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html

We should discuss whether and how:

   - to represent monosemous words
   - to break down the statistics with respect to different part of speech
   tags

Regards

Manuel

2015-07-07 15:02 GMT+02:00 Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
:

> Dear all,
>
>  I went through the lime module today, streamlining the definitions etc.
> to make them more conformant to the rest of the modules. I also updated the
> ontology. I will go through all sections asking for comments on Friday.
>
> Please send me any comments you deem important by Friday.
>
> I still need to work through the examples both in the wiki and the git
> repo. It seems to me that we need a few additional examples in this section.
>
> Kind regards,
>
> Philipp.
>
> --
> --
> Prof. Dr. Philipp Cimiano
> AG Semantic Computing
> Exzellenzcluster für Cognitive Interaction Technology (CITEC)
> Universität Bielefeld
>
> Tel: +49 521 106 12249
> Fax: +49 521 106 6560
> Mail: cimiano@cit-ec.uni-bielefeld.de
>
> Office CITEC-2.307
> Universitätsstr. 21-25
> 33615 Bielefeld, NRW
> Germany
>
>
>


-- 
Manuel Fiorelli
Received on Tuesday, 7 July 2015 13:56:21 UTC