Re: lexicalization count from Manuel Fiorelli on 2014-06-05 (public-ontolex@w3.org from June 2014)

From: Manuel Fiorelli <fiorelli@info.uniroma2.it>
Date: Thu, 5 Jun 2014 19:24:19 +0200
To: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Cc: Armando Stellato <stellato@info.uniroma2.it>, "public-ontolex@w3.org" <public-ontolex@w3.org>
Message-ID: <CAGDmdGh3Act-yL15O-i_kBZYyxO5tWhB57gG5-GfUTJ6h81R4A@mail.gmail.com>
Ops... sorry again. I attached the owl model.


2014-06-05 19:23 GMT+02:00 Manuel Fiorelli <fiorelli@info.uniroma2.it>:

> Dear list,
>
> sorry for double posting. However, I sent the original email from my gmail
> account, and the message could be delayed for two days, since it was the
> first time I used that account.
>
>
> 2014-06-05 19:11 GMT+02:00 Manuel Fiorelli <manuel.fiorelli@gmail.com>:
>
> Dear Philipp,
>>
>> I attached to this email an initial OWL model representing the LIME
>> metadata vocabulary.
>>
>> Let me summarize the model.
>>
>> The central entity is now the lime:Lexicalization which:
>>
>>    - provides lexicalizations for an RDF datasets (i.e., a collection of
>>    linguistic attachments);
>>    - in one natural language;
>>    - using one ore more linguistic models (as long as they are used to
>>    express the SAME information, up to the expressive power of each model);
>>    - possibly referencing a given OntoLex Lexicon.
>>
>> In the proposed model, the properties from a lexicalization to the target
>> dataset and the lexicon are functional.
>>
>> Since in some usage scenarios we start with an ontology and we want to
>> discover lexicalizations for it, we also provide a property that connects
>> any void:Dataset to known lexicalizations.
>>
>> Each lexicalization refers to various ResourceCoverage(s), which provide
>> statistics for different types of resources found in the target dataset.
>>
>> Currently, we have not committed to a specific set of statistics,
>> therefore I just introduced the ones already mentioned by Philipp. With
>> respect to his proposal, I slightly changed the domain of some of the
>> properties. For instance, I do not believe that references (the count of
>> distinct references) should be applicable to a Lexicon.
>> As already said by Armando, we have a distinct class
>> (lime:LexicalLinkset) for expressing the association between a dataset and
>> a conceptualized linguistic resource (e.g., WordNet). In fact, this
>> association is close to a mapping relation, thus we decided to introduce a
>> distinct class LexicalLinkset that extends the standard class void:Dataset.
>> However, we do believe that preserving the distinction may be useful.
>>
>> Despite we have removed the our categorization of linguistic resources, I
>> reintroduced the class ConceputualizedLinguisticResource, which should be
>> used in conjunction with LexicalLinkset.
>>
>> In the proposed model, I recreated some classes, such as Lexicon, that
>> already exists in the core OntoLex model. We should decide, whether they
>> are the same class or not.
>>
>> Another interesting point of discussion is our choice of providing two
>> properties:
>>
>>    - lang, which indicates the natural language a given lexicalication
>>    refers to
>>    - language, which is a shortcut to allow a dataset saying: I know
>>    there is a lexicalization for me in this natural language.
>>
>> We should discuss whether these two properties are required, and in case
>> which of them unify with ontolex:language.
>>
>>
>> 2014-06-04 19:37 GMT+02:00 Armando Stellato <stellato@info.uniroma2.it>:
>>
>> Dear Philipp,
>>>
>>>
>>>
>>> sorry for not catching up earlier with this email. Just came back from
>>> LREC and departed immediately for another conf in Taiwan which I’m still
>>> attending. Writing “nightwise”…
>>>
>>>
>>>
>>> Ok, so, as a first thing, I had a long call with Manuel just now. He
>>> will send in the next days something that you can publish on GIT.
>>> Obviously, everthing under discussion, but it is just a starting point to
>>> have it open and accessible on GIT.
>>>
>>>
>>>
>>> I anticipate here some replies to your email:
>>>
>>>
>>>
>>> // counting properties (datatype properties, with domain
>>> (ontolex:Lexicon OR ontolex:Lexicalization OR void:Dataset OR
>>> lime:LanguageCoverage)
>>>
>>> lime:numberOfLexicalEntries
>>> lime:numberOfSenses
>>> lime:numberOfLexicalizations (denote-tirples)
>>> lime:numberOfReferences -> the number of distinct references used
>>>
>>> We then need to discuss whether we should also include ratios etc.
>>>
>>>
>>>
>>> As said before, we would prefer to use simple names (in the spirit of
>>> analoguous properties on void), such as lexicalEntris, senses,
>>> lexicalizations, references. Small note: not so sure if to keep the
>>> ambiguity “Lexicalization” (as a dataset of lexicalizations) and
>>> “lexicalization” as an attachment. It creates then ambiguitirs like the
>>> property “lexicalizations” (as number of attachments) and “lexicalization”
>>> as pointer to a Lexicalization.
>>>
>>> But, for the moment, let’s stick with them.
>>>
>>>
>>>
>>>
>>>
>>> Then:
>>>
>>> lime:language (unified with ontolex:language, extended here to domain
>>> lime:LanguageCoverage
>>>
>>>
>>>
>>> Not sure I got it exactly the above. Btw, we will present two different
>>> props, and then check what can be unified.
>>>
>>>
>>>
>>> lime:linguisticModel: describing by which model/vocabulary information
>>> about lexicalization is attached; the domain is void:Dataset and the range
>>> is the URI of the vocabulary; lime:linguisticModel is thus a subproperty of
>>> void:vocabulary
>>>
>>>
>>>
>>> Fine. One note here: we saw now that in the PDF about LIME we sent
>>> before, there is one thing that we resolved in one chapter, and left
>>> obsolete in one other.
>>>
>>> Wrt our LIME paper, there is no more languageCoverage (and thus, even no
>>> need of changing it to lexicalCoverage as we wrojngly left said at the end
>>> of page 2 ) as it has been replaced by Lexicalization. Inside a
>>> Lexicalization, we may specify different ResourceCoverage, that is, various
>>> “cuts” of coverage for different ontology types (e.g. the coverage for
>>> classes, or for properties, or for skos:Concepts ).
>>>
>>> This also simplifies the terminology (though, as said before,
>>> Lexicalization clashes with the name of its own contained attachments).
>>>
>>>
>>>
>>> One more point: we left open the problem of addressing links to
>>> LexicalConcepts of conceptualized lexical resources (e.g. wordnet).
>>>
>>> We just resolved it in a decently elegant way. A lexicalization only
>>> deals with attachments between OWL/SKOS dataset/vocabulary (the “onto”
>>> part) and senses or lexical entris of a lexicon.
>>>
>>> Attachments to lexicalconcepts (the ones we called
>>> lexicalResourceCoverage in our paper) will be dealt in a different way (as
>>> it is only implicitly a lexicalization), though reusing existing stuff from
>>> void.
>>>
>>> We would coin the class: LexicalLinkSet as a subclass of void:LinkSet,
>>> and it would be used to express the links above.
>>>
>>>
>>>
>>> Note that several linguisticModels can co-exist in principle in a
>>> dataset...
>>>
>>>
>>>
>>> Sure. More precisely, an “onto” Dataset may specify more (known)
>>> Lexicalizations . each lexicalizations refers to only one language. One
>>> (onto) dataset may have more than one lexicalization per language
>>> (obviously); this maybe due to different models being available, or simply
>>> to different lexicons being available and linked to the same (onto) dataset.
>>>
>>> We were thinking (for more compactness) to allow for the specification
>>> of more linguisticmodels for the same lexicalization, whenever *exactly*
>>> the same lexical content is available (in the same lexicalization). For
>>> instance, if SKOSXL and materialized SKOS labels and RDFS labels are
>>> available inside the same physical dataset representing a lexicalization,
>>> then it is possible to specify them as alternative models inside the same
>>> Lexicalization instance.
>>>
>>>
>>>
>>>
>>> lime:type: providing a type for the resource in question, e.g. bilingual
>>> lexicon, lexicon, ..., domain is void:Dataset and range is not specified
>>>
>>>
>>>
>>> eheh, ok, you know our point of view, so better we leave you and John
>>> discussing on what is intopic or offtopic inside OntoLex, then only in the
>>> first case, we can give our contribution ;)
>>>
>>>
>>>
>>> lime:languageCoverage with domain void:Datase and range
>>> lime:LanguageCoverage.
>>>
>>>
>>>
>>> ime:LanguageCoverage has a language, a linguistic Model and all the
>>> counting properties above are defined for it.
>>>
>>>
>>>
>>> ok, replaced by Lexicalization, see above (and also all pages of PDF,
>>> except page 2).
>>>
>>>
>>>
>>> Think that’s all. Manuel will follow with a specification via email, so
>>> that you can put it on GIT.
>>>
>>>
>>>
>>> Sorry, I will be unable to participate on (still in conference).
>>>
>>>
>>>
>>> Cheers,
>>>
>>>
>>>
>>> Armando (and Manuel from call ;) )
>>>
>>>
>>
>>
>> --
>> Manuel Fiorelli
>>
>
>
>
> --
> Manuel Fiorelli
> PhD student in Computer and Automation Engineering
> Dept. of Civil Engineering and Computer Science
> University of Rome "Tor Vergata"
> Via del Politecnico 1
> 00133 Roma, Italy
>
> tel: +39-06-7259-7334
> skype: fiorelli.m
>



-- 
Manuel Fiorelli
PhD student in Computer and Automation Engineering
Dept. of Civil Engineering and Computer Science
University of Rome "Tor Vergata"
Via del Politecnico 1
00133 Roma, Italy

tel: +39-06-7259-7334
skype: fiorelli.m
Attachments

application/rdf+xml attachment: lime.owl
Received on Thursday, 5 June 2014 17:24:47 UTC