RE: lexicalization count from Armando Stellato on 2014-06-05 (public-ontolex@w3.org from June 2014)

From: Armando Stellato <stellato@info.uniroma2.it>
Date: Thu, 5 Jun 2014 20:39:45 +0200
To: "'Manuel Fiorelli'" <manuel.fiorelli@gmail.com>, "'Philipp Cimiano'" <cimiano@cit-ec.uni-bielefeld.de>
Cc: <public-ontolex@w3.org>
Message-ID: <01ce01cf80ed$86502ec0$92f08c40$@info.uniroma2.it>
Just a small errata corrige:

LexicalizedLinkSet extends void:LinkSet (though yes, then extends in turn void:Dataset as well).

Thanks a lot Manuel,

Armando

 

 

From: Manuel Fiorelli [mailto:manuel.fiorelli@gmail.com] 
Sent: Thursday, June 5, 2014 7:12 PM
To: Philipp Cimiano
Cc: Armando Stellato; public-ontolex@w3.org
Subject: Re: lexicalization count

 

Dear Philipp,

I attached to this email an initial OWL model representing the LIME metadata vocabulary.

Let me summarize the model.

The central entity is now the lime:Lexicalization which:

* provides lexicalizations for an RDF datasets (i.e., a collection of linguistic attachments);
* in one natural language;
* using one ore more linguistic models (as long as they are used to express the SAME information, up to the expressive power of each model);
* possibly referencing a given OntoLex Lexicon.

In the proposed model, the properties from a lexicalization to the target dataset and the lexicon are functional.

Since in some usage scenarios we start with an ontology and we want to discover lexicalizations for it, we also provide a property that connects any void:Dataset to known lexicalizations.

Each lexicalization refers to various ResourceCoverage(s), which provide statistics for different types of resources found in the target dataset.

Currently, we have not committed to a specific set of statistics, therefore I just introduced the ones already mentioned by Philipp. With respect to his proposal, I slightly changed the domain of some of the properties. For instance, I do not believe that references (the count of distinct references) should be applicable to a Lexicon.

As already said by Armando, we have a distinct class (lime:LexicalLinkset) for expressing the association between a dataset and a conceptualized linguistic resource (e.g., WordNet). In fact, this association is close to a mapping relation, thus we decided to introduce a distinct class LexicalLinkset that extends the standard class void:Dataset. However, we do believe that preserving the distinction may be useful.

Despite we have removed the our categorization of linguistic resources, I reintroduced the class ConceputualizedLinguisticResource, which should be used in conjunction with LexicalLinkset.

 

In the proposed model, I recreated some classes, such as Lexicon, that already exists in the core OntoLex model. We should decide, whether they are the same class or not.

Another interesting point of discussion is our choice of providing two properties:

* lang, which indicates the natural language a given lexicalication refers to
* language, which is a shortcut to allow a dataset saying: I know there is a lexicalization for me in this natural language.

We should discuss whether these two properties are required, and in case which of them unify with ontolex:language.

 

2014-06-04 19:37 GMT+02:00 Armando Stellato <stellato@info.uniroma2.it <mailto:stellato@info.uniroma2.it> >:

Dear Philipp,

 

sorry for not catching up earlier with this email. Just came back from LREC and departed immediately for another conf in Taiwan which I’m still attending. Writing “nightwise”…

 

Ok, so, as a first thing, I had a long call with Manuel just now. He will send in the next days something that you can publish on GIT. Obviously, everthing under discussion, but it is just a starting point to have it open and accessible on GIT.

 

I anticipate here some replies to your email:

 

// counting properties (datatype properties, with domain (ontolex:Lexicon OR ontolex:Lexicalization OR void:Dataset OR lime:LanguageCoverage)

lime:numberOfLexicalEntries
lime:numberOfSenses
lime:numberOfLexicalizations (denote-tirples)
lime:numberOfReferences -> the number of distinct references used

We then need to discuss whether we should also include ratios etc.

 

As said before, we would prefer to use simple names (in the spirit of analoguous properties on void), such as lexicalEntris, senses, lexicalizations, references. Small note: not so sure if to keep the ambiguity “Lexicalization” (as a dataset of lexicalizations) and “lexicalization” as an attachment. It creates then ambiguitirs like the property “lexicalizations” (as number of attachments) and “lexicalization” as pointer to a Lexicalization.

But, for the moment, let’s stick with them.

 

 

Then:

lime:language (unified with ontolex:language, extended here to domain lime:LanguageCoverage

 

Not sure I got it exactly the above. Btw, we will present two different props, and then check what can be unified.

 

lime:linguisticModel: describing by which model/vocabulary information about lexicalization is attached; the domain is void:Dataset and the range is the URI of the vocabulary; lime:linguisticModel is thus a subproperty of void:vocabulary

 

Fine. One note here: we saw now that in the PDF about LIME we sent before, there is one thing that we resolved in one chapter, and left obsolete in one other.

Wrt our LIME paper, there is no more languageCoverage (and thus, even no need of changing it to lexicalCoverage as we wrojngly left said at the end of page 2 ) as it has been replaced by Lexicalization. Inside a Lexicalization, we may specify different ResourceCoverage, that is, various “cuts” of coverage for different ontology types (e.g. the coverage for classes, or for properties, or for skos:Concepts ).

This also simplifies the terminology (though, as said before, Lexicalization clashes with the name of its own contained attachments).

 

One more point: we left open the problem of addressing links to LexicalConcepts of conceptualized lexical resources (e.g. wordnet). 

We just resolved it in a decently elegant way. A lexicalization only deals with attachments between OWL/SKOS dataset/vocabulary (the “onto” part) and senses or lexical entris of a lexicon.

Attachments to lexicalconcepts (the ones we called lexicalResourceCoverage in our paper) will be dealt in a different way (as it is only implicitly a lexicalization), though reusing existing stuff from void.

We would coin the class: LexicalLinkSet as a subclass of void:LinkSet, and it would be used to express the links above.

 

Note that several linguisticModels can co-exist in principle in a dataset...

 

Sure. More precisely, an “onto” Dataset may specify more (known) Lexicalizations . each lexicalizations refers to only one language. One (onto) dataset may have more than one lexicalization per language (obviously); this maybe due to different models being available, or simply to different lexicons being available and linked to the same (onto) dataset.

We were thinking (for more compactness) to allow for the specification of more linguisticmodels for the same lexicalization, whenever exactly the same lexical content is available (in the same lexicalization). For instance, if SKOSXL and materialized SKOS labels and RDFS labels are available inside the same physical dataset representing a lexicalization, then it is possible to specify them as alternative models inside the same Lexicalization instance. 

 


lime:type: providing a type for the resource in question, e.g. bilingual lexicon, lexicon, ..., domain is void:Dataset and range is not specified

 

eheh, ok, you know our point of view, so better we leave you and John discussing on what is intopic or offtopic inside OntoLex, then only in the first case, we can give our contribution ;)

 

lime:languageCoverage with domain void:Datase and range lime:LanguageCoverage.

 

ime:LanguageCoverage has a language, a linguistic Model and all the counting properties above are defined for it.

 

ok, replaced by Lexicalization, see above (and also all pages of PDF, except page 2).

 

Think that’s all. Manuel will follow with a specification via email, so that you can put it on GIT.

 

Sorry, I will be unable to participate on (still in conference).

 

Cheers,

 

Armando (and Manuel from call ;) )

 




-- 
Manuel Fiorelli
Received on Thursday, 5 June 2014 18:40:22 UTC