Re: lime module

Dear Philipp,

here are additional comments on the Lime module.

*Section "lexicon metadata"*

Just before the definition box of *linguistic model*:

"We may also specify the linguistic (annotation model) used in a lexicon
with the linguistic model property"

I think that the word "model" should go outside the parenthesis.
Additionally, I would make it clearer that we are talking about things such
as part of speech, number, gender, and so... maybe also by pointing to the
section of the specification where we wrote explicitly that.

*Section "Lexicalization Set"*

"In RDF, a lexicalization is expressed via the property rdfs:label."

It should be "In RDFS" (note I added an S).

*Section "Partitions"*

"many cases, we want to provide descriptive metadata about a subset of a
lexicallization"

it should be "of a lexicalization set"

*Section "Publication Strategies"*


"For example, this allows lexicalizing lexical concepts from an existing
wordnet in a different natural language than the one for which the resource
was initially conceived"

I am unsure that it is appropriate to use the word "lexicalizing" in
association with lexical concepts, because we insisted that the nature of a
"conceptualization set" is different from that of a "lexicalization set"

Best regards

Manuel

2015-07-07 15:55 GMT+02:00 Manuel Fiorelli <manuel.fiorelli@gmail.com>:

> Dear Philipp, All
>
> here are my preliminary comments. Most of them are minor typos, while
> other may seed further discussion.
>
> -----
>
> In the introduction to example 1, the spec says:
>
> "As an example we may describe a simple lexicon using this property as
> well as properties from Dublin Core and VoID: "
>
> The example then contains also the actual lexical entries that constitute
> the lexicon. This is good for what concerns the self-explanatory nature of
> the example. However, we should make clear that in general the metadata
> only deals with the description of the lexicon as a whole, while the
> representation of its actual content is in the scope of other modules. This
> is particularly relevant to "lexicon catalogs", which may only be
> interested in indexing lexicons without the need to also host the actual
> content.
>
> -----
>
> In the definition of LexicalizationSet, the classes Lexicon and Dataset need,
> respectively, the prefix ontolex and void.
>
> -----
>
> I am not sure about this statement:
>
> "The lexicalization set object should be unique for a given
> lexicon-ontology pair"
>
> Indeed, the statement above imply that there cannot be two different
> lexicalization sets for FOAF using the WordNet RDF lexicon. I think that
> this conclusion is false, so the previous statement should be retracted.
>
> -----
>
> In the definition of lexicalizationModel, the disjunction is spelled OR,
> whereas in other cases it is spelled in lowercase.
>
> -----
>
> The definition of lime:references does not mention the fact that in a
> lexical linkset an ontology reference can be associated with a lexical
> concept.
>
> -----
>
> Concerning Example2:
> - we should add the language "ja" to the lexicalizationSet resource
> - we may say that the ontology is an instance of voaf:Vocabulary, which
> is a subclass of void:Dataset to represent vocabularies (both RDFS
> Schemas and OWL Ontologies)
> - I would extend the introduction to the example. This is my attempt:
>
> <cite>
> In the following example, we describe a lexicalization set expressing how
> elements of an ontology can be verbalized in Japanese by means of entries
> from a supplied lexicon. The metadata clearly tells which ontology and
> lexicon are involved in the lexicalization sets, as well as the relevant
> natural language. The knowledge of these facts about the lexicalization set
> allows us to assess the usefulness of a lexicalization set for a given task
> as well to discover relevant lexicalization sets, when we are constrained
> by the choice of an ontology, lexicon or natural language.
>
> We model the ontology as an instance of the class voaf:Vocabulary that is
> a kind of void:Dataset representing vocabularies (bot RDFS Schemas and
> OWL Ontologies). We benefit from the more specific distinctions made by
> VOAF, by breaking down the total number of entities in the ontology (held
> by the property void:entities) into separate counts for the classes and
> properties (held by voaf:classNumber and voaf:propertyNumber,
> respectively).
>
> Similarly, we use terms from the Lime vocabulary to represent statistics
> about the linguistic content of the lexicon and the lexicalization set.
> Overall, the ontology defines 80 entities and the lexicon 100 lexical
> entries; however, only 20 entities from the target ontologies have been
> associated with a total of 50 lexical entries.
> </cite>
>
> -----
>
> In the definition of avgNumOfLexicalizations, it occurs the word "define"
> while it should be "defines".
>
> -----
>
> I would postpone example 3 to end of the section, and I would modify it
> as follows:
> - reuse the same data as in example 2, and make this clear in the
> introduction to the example
> - then, use the properties lexicalizations, avgNumOfLexicalizations and percentage
> to "analyze" the scenario depicted in example 2. For instance, it is now
> possible to tell explicitly that only 25% of the reference ontology has
> been lexicalized.
>
> We can make the example more interesting playing with polisemy so that the
> ratios are not "obvious".
>
> -----
>
> In the definition of LexicalLinkset, the class dataset needs the prefix
> void.
>
> -----
>
> I would propose the following example for lime:ConceptualizationSet
>
> :WnConceptualizationSet a lime:ConceptualizationSet ;
>   lime:conceptualDataset :WnConceptSet ;
>   lime:lexiconDataset :WnLexicon ;
>   lime:lexicalEntries 155287 ;
>   lime:concepts 117659 ;
>   lime:conceptualizations 206941 ;
>   lime:avgPolisemy 1.33
>   .
>
> For the statistics, I referred to this page:
> https://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html
>
> We should discuss whether and how:
>
>    - to represent monosemous words
>    - to break down the statistics with respect to different part of
>    speech tags
>
> Regards
>
> Manuel
>
> 2015-07-07 15:02 GMT+02:00 Philipp Cimiano <
> cimiano@cit-ec.uni-bielefeld.de>:
>
>> Dear all,
>>
>>  I went through the lime module today, streamlining the definitions etc.
>> to make them more conformant to the rest of the modules. I also updated the
>> ontology. I will go through all sections asking for comments on Friday.
>>
>> Please send me any comments you deem important by Friday.
>>
>> I still need to work through the examples both in the wiki and the git
>> repo. It seems to me that we need a few additional examples in this section.
>>
>> Kind regards,
>>
>> Philipp.
>>
>> --
>> --
>> Prof. Dr. Philipp Cimiano
>> AG Semantic Computing
>> Exzellenzcluster für Cognitive Interaction Technology (CITEC)
>> Universität Bielefeld
>>
>> Tel: +49 521 106 12249
>> Fax: +49 521 106 6560
>> Mail: cimiano@cit-ec.uni-bielefeld.de
>>
>> Office CITEC-2.307
>> Universitätsstr. 21-25
>> 33615 Bielefeld, NRW
>> Germany
>>
>>
>>
>
>
> --
> Manuel Fiorelli
>



-- 
Manuel Fiorelli

Received on Friday, 10 July 2015 13:27:39 UTC