Re: lexicalization count from Manuel Fiorelli on 2014-06-05 (public-ontolex@w3.org from June 2014)

From: Manuel Fiorelli <fiorelli@info.uniroma2.it>
Date: Thu, 5 Jun 2014 21:41:21 +0200
To: Armando Stellato <stellato@info.uniroma2.it>
Cc: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>, "public-ontolex@w3.org" <public-ontolex@w3.org>
Message-ID: <CAGDmdGgOb4ViDcQ4Q+Ovf892mJDeCYQ2X1P-7s2EybQaYsC1-Q@mail.gmail.com>
Hello Armando,

the attached OWL model should be correct with respect to the question you
raised.


2014-06-05 20:39 GMT+02:00 Armando Stellato <stellato@info.uniroma2.it>:

> Just a small errata corrige:
>
> LexicalizedLinkSet extends void:LinkSet (though yes, then extends in turn
> void:Dataset as well).
>
> Thanks a lot Manuel,
>
> Armando
>
>
>
>
>
> *From:* Manuel Fiorelli [mailto:manuel.fiorelli@gmail.com]
> *Sent:* Thursday, June 5, 2014 7:12 PM
> *To:* Philipp Cimiano
> *Cc:* Armando Stellato; public-ontolex@w3.org
> *Subject:* Re: lexicalization count
>
>
>
> Dear Philipp,
>
>
> I attached to this email an initial OWL model representing the LIME
> metadata vocabulary.
>
> Let me summarize the model.
>
> The central entity is now the lime:Lexicalization which:
>
>    - provides lexicalizations for an RDF datasets (i.e., a collection of
>    linguistic attachments);
>    - in one natural language;
>    - using one ore more linguistic models (as long as they are used to
>    express the SAME information, up to the expressive power of each model);
>    - possibly referencing a given OntoLex Lexicon.
>
> In the proposed model, the properties from a lexicalization to the target
> dataset and the lexicon are functional.
>
> Since in some usage scenarios we start with an ontology and we want to
> discover lexicalizations for it, we also provide a property that connects
> any void:Dataset to known lexicalizations.
>
> Each lexicalization refers to various ResourceCoverage(s), which provide
> statistics for different types of resources found in the target dataset.
>
> Currently, we have not committed to a specific set of statistics,
> therefore I just introduced the ones already mentioned by Philipp. With
> respect to his proposal, I slightly changed the domain of some of the
> properties. For instance, I do not believe that references (the count of
> distinct references) should be applicable to a Lexicon.
>
> As already said by Armando, we have a distinct class (lime:LexicalLinkset)
> for expressing the association between a dataset and a conceptualized
> linguistic resource (e.g., WordNet). In fact, this association is close to
> a mapping relation, thus we decided to introduce a distinct class
> LexicalLinkset that extends the standard class void:Dataset. However, we do
> believe that preserving the distinction may be useful.
>
> Despite we have removed the our categorization of linguistic resources, I
> reintroduced the class ConceputualizedLinguisticResource, which should be
> used in conjunction with LexicalLinkset.
>
>
>
> In the proposed model, I recreated some classes, such as Lexicon, that
> already exists in the core OntoLex model. We should decide, whether they
> are the same class or not.
>
> Another interesting point of discussion is our choice of providing two
> properties:
>
>    - lang, which indicates the natural language a given lexicalication
>    refers to
>    - language, which is a shortcut to allow a dataset saying: I know
>    there is a lexicalization for me in this natural language.
>
> We should discuss whether these two properties are required, and in case
> which of them unify with ontolex:language.
>
>
>
> 2014-06-04 19:37 GMT+02:00 Armando Stellato <stellato@info.uniroma2.it>:
>
> Dear Philipp,
>
>
>
> sorry for not catching up earlier with this email. Just came back from
> LREC and departed immediately for another conf in Taiwan which I’m still
> attending. Writing “nightwise”…
>
>
>
> Ok, so, as a first thing, I had a long call with Manuel just now. He will
> send in the next days something that you can publish on GIT. Obviously,
> everthing under discussion, but it is just a starting point to have it open
> and accessible on GIT.
>
>
>
> I anticipate here some replies to your email:
>
>
>
> // counting properties (datatype properties, with domain (ontolex:Lexicon
> OR ontolex:Lexicalization OR void:Dataset OR lime:LanguageCoverage)
>
> lime:numberOfLexicalEntries
> lime:numberOfSenses
> lime:numberOfLexicalizations (denote-tirples)
> lime:numberOfReferences -> the number of distinct references used
>
> We then need to discuss whether we should also include ratios etc.
>
>
>
> As said before, we would prefer to use simple names (in the spirit of
> analoguous properties on void), such as lexicalEntris, senses,
> lexicalizations, references. Small note: not so sure if to keep the
> ambiguity “Lexicalization” (as a dataset of lexicalizations) and
> “lexicalization” as an attachment. It creates then ambiguitirs like the
> property “lexicalizations” (as number of attachments) and “lexicalization”
> as pointer to a Lexicalization.
>
> But, for the moment, let’s stick with them.
>
>
>
>
>
> Then:
>
> lime:language (unified with ontolex:language, extended here to domain
> lime:LanguageCoverage
>
>
>
> Not sure I got it exactly the above. Btw, we will present two different
> props, and then check what can be unified.
>
>
>
> lime:linguisticModel: describing by which model/vocabulary information
> about lexicalization is attached; the domain is void:Dataset and the range
> is the URI of the vocabulary; lime:linguisticModel is thus a subproperty of
> void:vocabulary
>
>
>
> Fine. One note here: we saw now that in the PDF about LIME we sent before,
> there is one thing that we resolved in one chapter, and left obsolete in
> one other.
>
> Wrt our LIME paper, there is no more languageCoverage (and thus, even no
> need of changing it to lexicalCoverage as we wrojngly left said at the end
> of page 2 ) as it has been replaced by Lexicalization. Inside a
> Lexicalization, we may specify different ResourceCoverage, that is, various
> “cuts” of coverage for different ontology types (e.g. the coverage for
> classes, or for properties, or for skos:Concepts ).
>
> This also simplifies the terminology (though, as said before,
> Lexicalization clashes with the name of its own contained attachments).
>
>
>
> One more point: we left open the problem of addressing links to
> LexicalConcepts of conceptualized lexical resources (e.g. wordnet).
>
> We just resolved it in a decently elegant way. A lexicalization only deals
> with attachments between OWL/SKOS dataset/vocabulary (the “onto” part) and
> senses or lexical entris of a lexicon.
>
> Attachments to lexicalconcepts (the ones we called lexicalResourceCoverage
> in our paper) will be dealt in a different way (as it is only implicitly a
> lexicalization), though reusing existing stuff from void.
>
> We would coin the class: LexicalLinkSet as a subclass of void:LinkSet, and
> it would be used to express the links above.
>
>
>
> Note that several linguisticModels can co-exist in principle in a
> dataset...
>
>
>
> Sure. More precisely, an “onto” Dataset may specify more (known)
> Lexicalizations . each lexicalizations refers to only one language. One
> (onto) dataset may have more than one lexicalization per language
> (obviously); this maybe due to different models being available, or simply
> to different lexicons being available and linked to the same (onto) dataset.
>
> We were thinking (for more compactness) to allow for the specification of
> more linguisticmodels for the same lexicalization, whenever *exactly* the
> same lexical content is available (in the same lexicalization). For
> instance, if SKOSXL and materialized SKOS labels and RDFS labels are
> available inside the same physical dataset representing a lexicalization,
> then it is possible to specify them as alternative models inside the same
> Lexicalization instance.
>
>
>
>
> lime:type: providing a type for the resource in question, e.g. bilingual
> lexicon, lexicon, ..., domain is void:Dataset and range is not specified
>
>
>
> eheh, ok, you know our point of view, so better we leave you and John
> discussing on what is intopic or offtopic inside OntoLex, then only in the
> first case, we can give our contribution ;)
>
>
>
> lime:languageCoverage with domain void:Datase and range
> lime:LanguageCoverage.
>
>
>
> ime:LanguageCoverage has a language, a linguistic Model and all the
> counting properties above are defined for it.
>
>
>
> ok, replaced by Lexicalization, see above (and also all pages of PDF,
> except page 2).
>
>
>
> Think that’s all. Manuel will follow with a specification via email, so
> that you can put it on GIT.
>
>
>
> Sorry, I will be unable to participate on (still in conference).
>
>
>
> Cheers,
>
>
>
> Armando (and Manuel from call ;) )
>
>
>
>
> --
> Manuel Fiorelli
>



-- 
Manuel Fiorelli
PhD student in Computer and Automation Engineering
Dept. of Civil Engineering and Computer Science
University of Rome "Tor Vergata"
Via del Politecnico 1
00133 Roma, Italy

tel: +39-06-7259-7334
skype: fiorelli.m
Received on Thursday, 5 June 2014 19:41:50 UTC