Re: lime module from Manuel Fiorelli on 2015-07-13 (public-ontolex@w3.org from July 2015)

From: Manuel Fiorelli <manuel.fiorelli@gmail.com>
Date: Mon, 13 Jul 2015 18:22:40 +0200
To: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Cc: "public-ontolex@w3.org" <public-ontolex@w3.org>
Message-ID: <CAGDmdGg9vUOdD3vkQ8gCch820Uus5UjK9bdp9Ok76-Bdsb1j5w@mail.gmail.com>
Dear Philipp, All

Following our discussion on the LIME module during the last telco, here are
some updates on the specification:

https://www.w3.org/community/ontolex/wiki/index.php?title=Final_Model_Specification&diff=2289&oldid=2250

The spec has been modified to address some of the issues I have raised in
previous emails (see details below within the quoted text).

The diagram on Draw.io has been modified, considering the current state of
the Lime metadata vocabulary. Further modifications could be required once
you decided what to do with the properties to renamed or split.

Some examples were added to the end of the metadata module, but we will
revise them in the next days. We modified some definitions, but others have
not been modified because of the possibility they could be split or
renamed. Specifically, here are some definitions (or axioms) to be modified:

*lime:lexicalEntries*

- The domain of this property should be Lexicon or LexicalizationSet or
Conceptualization and the definition should be changed accordingly, unless
we want to split this property into two or more properties.


*lime:referenceDataset*

- the definition should be reviewed


*lime:lexicalizationModel*

- the domain should not include ontolex:Lexicon (this could be a refuse
remained after the introduction of lime:linguisticModel)


*lime:references*

- Not sure if this will be split or renamed


*lime:percentages*

- in the definition, we should add the mention to lexical linksets


*lime:partition*

- the definition of partition is wrong, as it only refers to lexicalization
sets


*lime:resourceType*

- as before, it only mentions lexicalization sets



*lime:concepts*

the introduction to the definition of lime:concepts firstly mention its use
in a concept set, although we are in the section about lexicalLinkset


*lime:avgNumOfLinks*

- the definition is wrong. This property should give the average number of
links per ontology entity


2015-07-10 15:27 GMT+02:00 Manuel Fiorelli <manuel.fiorelli@gmail.com>:

>
> *Section "lexicon metadata"*
>
> Just before the definition box of *linguistic model*:
>
> "We may also specify the linguistic (annotation model) used in a lexicon
> with the linguistic model property"
>
> I think that the word "model" should go outside the parenthesis.
> Additionally, I would make it clearer that we are talking about things such
> as part of speech, number, gender, and so... maybe also by pointing to the
> section of the specification where we wrote explicitly that.
>

DONE


>
> *Section "Lexicalization Set"*
>
> "In RDF, a lexicalization is expressed via the property rdfs:label."
>
> It should be "In RDFS" (note I added an S).
>

DONE


>
> *Section "Partitions"*
>
> "many cases, we want to provide descriptive metadata about a subset of a
> lexicallization"
>
> it should be "of a lexicalization set"
>

Still TODO. Actually, the paragraph and the definition of the property
should be extended as well to incorporate both lexical linkset and
lexicalization sets.


>
> *Section "Publication Strategies"*
>
>
> "For example, this allows lexicalizing lexical concepts from an existing
> wordnet in a different natural language than the one for which the resource
> was initially conceived"
>
> I am unsure that it is appropriate to use the word "lexicalizing" in
> association with lexical concepts, because we insisted that the nature of a
> "conceptualization set" is different from that of a "lexicalization set"
>
>
DONE during the telco.

>
> 2015-07-07 15:55 GMT+02:00 Manuel Fiorelli <manuel.fiorelli@gmail.com>:
>
>> Dear Philipp, All
>>
>> here are my preliminary comments. Most of them are minor typos, while
>> other may seed further discussion.
>>
>> -----
>>
>> In the introduction to example 1, the spec says:
>>
>> "As an example we may describe a simple lexicon using this property as
>> well as properties from Dublin Core and VoID: "
>>
>> The example then contains also the actual lexical entries that constitute
>> the lexicon. This is good for what concerns the self-explanatory nature of
>> the example. However, we should make clear that in general the metadata
>> only deals with the description of the lexicon as a whole, while the
>> representation of its actual content is in the scope of other modules. This
>> is particularly relevant to "lexicon catalogs", which may only be
>> interested in indexing lexicons without the need to also host the actual
>> content.
>>
>>
WON'T FIX. We decided that the example is fine, and it may be the case that
further examples (only concerning with metadata) should be added later in
the spec.


> -----
>>
>> In the definition of LexicalizationSet, the classes Lexicon and Dataset need,
>> respectively, the prefix ontolex and void.
>>
>> -----
>>
>
DONE.


>
>> I am not sure about this statement:
>>
>> "The lexicalization set object should be unique for a given
>> lexicon-ontology pair"
>>
>> Indeed, the statement above imply that there cannot be two different
>> lexicalization sets for FOAF using the WordNet RDF lexicon. I think that
>> this conclusion is false, so the previous statement should be retracted.
>>
>>
TODO. I think that we agreed on removing that sentence, but I leave the
honor to the editors.


> -----
>>
>> In the definition of lexicalizationModel, the disjunction is spelled OR,
>> whereas in other cases it is spelled in lowercase.
>>
>> -----
>>
>
TODO. Actually, the misspelling has been corrected, but I think that we
should remove ontolex:Lexicon entirely, because that property only applied
to lexicalization sets. Concerning lexicons, we should use the related
property lime:linguisticModel.


>
>> The definition of lime:references does not mention the fact that in a
>> lexical linkset an ontology reference can be associated with a lexical
>> concept.
>>
>>
TODO. Actually, you mentioned the possibility that properties such as
references and concepts could be split.

> -----
>>
>> Concerning Example2:
>> - we should add the language "ja" to the lexicalizationSet resource
>> - we may say that the ontology is an instance of voaf:Vocabulary, which
>> is a subclass of void:Dataset to represent vocabularies (both RDFS
>> Schemas and OWL Ontologies)
>>
>
DONE the addition of the language to the lexicalization set as well the
addition of the lexicalization model.


> - I would extend the introduction to the example. This is my attempt:
>>
>> <cite>
>> In the following example, we describe a lexicalization set expressing how
>> elements of an ontology can be verbalized in Japanese by means of entries
>> from a supplied lexicon. The metadata clearly tells which ontology and
>> lexicon are involved in the lexicalization sets, as well as the relevant
>> natural language. The knowledge of these facts about the lexicalization set
>> allows us to assess the usefulness of a lexicalization set for a given task
>> as well to discover relevant lexicalization sets, when we are constrained
>> by the choice of an ontology, lexicon or natural language.
>>
>> We model the ontology as an instance of the class voaf:Vocabulary that
>> is a kind of void:Dataset representing vocabularies (bot RDFS Schemas
>> and OWL Ontologies). We benefit from the more specific distinctions made by
>> VOAF, by breaking down the total number of entities in the ontology (held
>> by the property void:entities) into separate counts for the classes and
>> properties (held by voaf:classNumber and voaf:propertyNumber,
>> respectively).
>>
>> Similarly, we use terms from the Lime vocabulary to represent statistics
>> about the linguistic content of the lexicon and the lexicalization set.
>> Overall, the ontology defines 80 entities and the lexicon 100 lexical
>> entries; however, only 20 entities from the target ontologies have been
>> associated with a total of 50 lexical entries.
>> </cite>
>>
>>
TODO. I prefer that this addition is applied by the editors.


> -----
>>
>> In the definition of avgNumOfLexicalizations, it occurs the word
>> "define" while it should be "defines".
>>
>>
DONE


> -----
>>
>> I would postpone example 3 to end of the section, and I would modify it
>> as follows:
>> - reuse the same data as in example 2, and make this clear in the
>> introduction to the example
>> - then, use the properties lexicalizations, avgNumOfLexicalizations and percentage
>> to "analyze" the scenario depicted in example 2. For instance, it is now
>> possible to tell explicitly that only 25% of the reference ontology has
>> been lexicalized.
>>
>> We can make the example more interesting playing with polisemy so that
>> the ratios are not "obvious".
>>
>>
>
TODO


> -----
>>
>> In the definition of LexicalLinkset, the class dataset needs the prefix
>> void.
>>
>>
DONE


> -----
>>
>> I would propose the following example for lime:ConceptualizationSet
>>
>> :WnConceptualizationSet a lime:ConceptualizationSet ;
>>   lime:conceptualDataset :WnConceptSet ;
>>   lime:lexiconDataset :WnLexicon ;
>>   lime:lexicalEntries 155287 ;
>>   lime:concepts 117659 ;
>>   lime:conceptualizations 206941 ;
>>   lime:avgPolisemy 1.33
>>   .
>>
>> For the statistics, I referred to this page:
>> https://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html
>>
>>
DONE


> We should discuss whether and how:
>>
>>    - to represent monosemous words
>>    - to break down the statistics with respect to different part of
>>    speech tags
>>
>>
>> WON'T DO. Actually, we thought that this use case could be supported by
partitions of the conceptualization sets, but various technical
difficulties made us desist :-D


Regards,

Manuel Fiorelli
Received on Monday, 13 July 2015 16:23:11 UTC