RE: lexicalization count

Dear Philipp,

 

sorry for not catching up earlier with this email. Just came back from LREC
and departed immediately for another conf in Taiwan which I'm still
attending. Writing "nightwise".

 

Ok, so, as a first thing, I had a long call with Manuel just now. He will
send in the next days something that you can publish on GIT. Obviously,
everthing under discussion, but it is just a starting point to have it open
and accessible on GIT.

 

I anticipate here some replies to your email:

 

// counting properties (datatype properties, with domain (ontolex:Lexicon OR
ontolex:Lexicalization OR void:Dataset OR lime:LanguageCoverage)

lime:numberOfLexicalEntries
lime:numberOfSenses
lime:numberOfLexicalizations (denote-tirples)
lime:numberOfReferences -> the number of distinct references used

We then need to discuss whether we should also include ratios etc.

 

As said before, we would prefer to use simple names (in the spirit of
analoguous properties on void), such as lexicalEntris, senses,
lexicalizations, references. Small note: not so sure if to keep the
ambiguity "Lexicalization" (as a dataset of lexicalizations) and
"lexicalization" as an attachment. It creates then ambiguitirs like the
property "lexicalizations" (as number of attachments) and "lexicalization"
as pointer to a Lexicalization.

But, for the moment, let's stick with them.

 

 

Then:

lime:language (unified with ontolex:language, extended here to domain
lime:LanguageCoverage

 

Not sure I got it exactly the above. Btw, we will present two different
props, and then check what can be unified.

 

lime:linguisticModel: describing by which model/vocabulary information about
lexicalization is attached; the domain is void:Dataset and the range is the
URI of the vocabulary; lime:linguisticModel is thus a subproperty of
void:vocabulary

 

Fine. One note here: we saw now that in the PDF about LIME we sent before,
there is one thing that we resolved in one chapter, and left obsolete in one
other.

Wrt our LIME paper, there is no more languageCoverage (and thus, even no
need of changing it to lexicalCoverage as we wrojngly left said at the end
of page 2 ) as it has been replaced by Lexicalization. Inside a
Lexicalization, we may specify different ResourceCoverage, that is, various
"cuts" of coverage for different ontology types (e.g. the coverage for
classes, or for properties, or for skos:Concepts ).

This also simplifies the terminology (though, as said before, Lexicalization
clashes with the name of its own contained attachments).

 

One more point: we left open the problem of addressing links to
LexicalConcepts of conceptualized lexical resources (e.g. wordnet). 

We just resolved it in a decently elegant way. A lexicalization only deals
with attachments between OWL/SKOS dataset/vocabulary (the "onto" part) and
senses or lexical entris of a lexicon.

Attachments to lexicalconcepts (the ones we called lexicalResourceCoverage
in our paper) will be dealt in a different way (as it is only implicitly a
lexicalization), though reusing existing stuff from void.

We would coin the class: LexicalLinkSet as a subclass of void:LinkSet, and
it would be used to express the links above.

 

Note that several linguisticModels can co-exist in principle in a dataset...



 

Sure. More precisely, an "onto" Dataset may specify more (known)
Lexicalizations . each lexicalizations refers to only one language. One
(onto) dataset may have more than one lexicalization per language
(obviously); this maybe due to different models being available, or simply
to different lexicons being available and linked to the same (onto) dataset.

We were thinking (for more compactness) to allow for the specification of
more linguisticmodels for the same lexicalization, whenever exactly the same
lexical content is available (in the same lexicalization). For instance, if
SKOSXL and materialized SKOS labels and RDFS labels are available inside the
same physical dataset representing a lexicalization, then it is possible to
specify them as alternative models inside the same Lexicalization instance. 

 


lime:type: providing a type for the resource in question, e.g. bilingual
lexicon, lexicon, ..., domain is void:Dataset and range is not specified



 

eheh, ok, you know our point of view, so better we leave you and John
discussing on what is intopic or offtopic inside OntoLex, then only in the
first case, we can give our contribution ;)

 

lime:languageCoverage with domain void:Datase and range
lime:LanguageCoverage.



 

ime:LanguageCoverage has a language, a linguistic Model and all the counting
properties above are defined for it.



 

ok, replaced by Lexicalization, see above (and also all pages of PDF, except
page 2).

 

Think that's all. Manuel will follow with a specification via email, so that
you can put it on GIT.

 

Sorry, I will be unable to participate on (still in conference).

 

Cheers,

 

Armando (and Manuel from call ;) )

 

Received on Wednesday, 4 June 2014 17:38:09 UTC