Re: R: R: Comments on lime.owl from Philipp Cimiano on 2014-06-27 (public-ontolex@w3.org from June 2014)

From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Date: Fri, 27 Jun 2014 14:18:32 +0200
To: Armando Stellato <stellato@info.uniroma2.it>, public-ontolex@w3.org
Message-ID: <53AD6118.4030905@cit-ec.uni-bielefeld.de>
Hi Armando,

yes, we agree. What I meant with my class lime:LexiconSubset is 
essentially what you mean when you say Lexicalization I think. So we 
need to agree on the name.

About being more fine-granular and representing

I do not think btw. that we need an OWL formalization of 
lime:LexiconSubset or lime:Lexicalization, we need to define what we 
mean with it in the NL definition of it. Might be a lengthy one though ;-)

Let me slightly reformulate your example so that I can show how what you 
describe would be described with the vocabulary that I have in mind:

myItLex:mySubset

a lime:LexiconSubset;

lime:lang "it"; // functional, min 1

   lime:lexicalizedObject :dat ; // functional, min 1

lime:lexicalModel <http://> ;  // functional, min 1
   lime:lexicon :myLexicon1 ; // multi-valued
lime:lexicon :myLexicon2 ; // multi-valued

   lime:entries 11;
   lime:lexicalizations 15;
   lime:senses 18;
   lime:references 20;

].

So the above would say, the subset of the dataset for the Italian 
language, that refers to :dat as object of lexicalization and uses 
lexicalModel <> contains 11 entries, 15 lexicalizations, 18 senses and 
20 references.

Included in this subdataset (mySubset) would be sort of implicitly all 
triples ext(MySubset)=(lex,sense,ref) such that LexicalEntry(lex) & 
LexicalSense(sense) & sense(lex,sense) & reference(sense,ref) & 
lang(lex)=it & definedInNameSpace(ref,:dat) & (entry(MyLexicon1,lex) OR 
... entry(myLexiconN,lex)).

This is not representable in OWL as we can not say that a triple 
(lex,sense, ref) is included in a dataset.

The properties entries / lexicalizations / senses / references are then 
projections of the set ext(mySubset).

The extention of MySubset is, however, not captured in OWL.

I think we sort of agree, we need to agree on the terminology then ;-)

Philipp.


Am 27.06.14 12:36, schrieb Armando Stellato:
>
> Dear Philipp,
>
> all very clear, and thanks for the formal description. Replies to the 
> two main points here below:
>
> I agree on the basic premises of multiple sense referring to one 
> concept and that this should count as one "lexicalization".
>
> *//*
>
> *//*
>
> */[Armando Stellato] /*
>
> Perfect then, was just in doubt about the use of senses mentioned in 
> the prev email.
>
> *//*
>
> [CUT]
>
> …
>
> These partitions would not exist explicitly, but only implicitly, but 
> could be referred to explicitly by say considering an instance of 
> "lime:LexiconSubset" that represents one of these equivalence classes 
> (i.e. the equivalence class corresponding to one ontology, one 
> language and one ling. model). For these equivalence classes and thus 
> for a subset of the Lexicon, we could indicate the values of the 
> statistical properties mentioned above. For some equivalence class c 
> we could then state the following:
>
> entries:  #{ lex : (lex,sense,ref) \in c}
> senses:  #{ sense: (lex,sense,ref) \in c}
> lexicalizations: # { (lex,ref) : (lex,sense,ref) \in c}
> references # { ref : (lex,sense,ref) \ in c}
>
> If we do not specify one of the three dimensions for such a slice, it 
> would correspond to the union of all equivalence classes for all 
> possible values of the unspecified dimensions.
>
> I hope I am more or less clear, I am saying that we need a logical 
> mechanism to implicitly partition a dataset into sub-datasets 
> according to the above mentioned three dimensions and some mechanism 
> to explicitly refer to these subdatasets in order to add metadata.
>
> This would make obsolete the classes: Lexicalization, LanguageCoverage 
> etc. as we could express all statistics by attaching the four basic 
> properties to different slices.
>
> Does this makes sense? If not, I will have to come up with concrete 
> examples I fear ;-)
>
> *//*
>
> */[Armando Stellato] /*
>
> Absolutely it makes sense (that is, it is sound), however, there are 
> two factors here:
>
> 1)Is this covering exactly what we need/want?
>
> 2)How to represent that? Maybe we more or less already agree on the 
> formal description above, but then it is only a matter of how to 
> represent that in OWL
>
> Regarding point 1, I list the following:
>
> a)there were another aspect. I would like to address the possibility 
> to add finer grain partitions with a further dimension, such as 
> “coverage of the skos:Concepts only”. This could ingenerate the fear 
> that there would be a combinatory explosion of possibilities, though…I 
> don’t see much of a typical case where there is one single monolithic 
> dataset containing both the ontology and its ontolex lexicalizations 
> for several languages. As you yourself have said in past occasions, 
> after all ontolex is not going to replace more trivial lexicalizations 
> in the “standard” bundling of ontologies/datasets, but would be 
> actually the language for developing great resources to provide 
> support in a multitude of NLP tasks. So, If someone is providing the 
> Spanish Ontolex Lexicalization (not relevant here if it includes the 
> Lexicon or not) for FOAF, he *may* want to provide these more fine 
> grained partitions
>
> b)Manuel and me split three entities (Ontology, Lexicon, 
> Lexicalization) to avoid redundancy and to specifically identify the 
> three core logical entities of OntoLex. Now, the Lexicon *should in 
> any case* report its entries and senses counts. The ontology already 
> reports its various counts in its void. I suppose indeed your 
> partitions need to be reified somehow, and be given a class to be used 
> as domain for those props. So, what you are addressing seems to me 
> what we called the “Lexicalization” class, nothing more, nothing 
> less…see point 2 here below.
>
> Regarding point 2, your theoretical representation needs an OWL 
> formulation. Obviously I’m not stuck in our one, but I didn’t see the 
> point in the above formulation for something different. I start from 
> where I left up there: your partitions in the end need a class and, 
> whatever you call it, it seems to be the Lexicalization class you 
> suggest to get rid of. And then it only amounts to the three narrow 
> topics of using counts or percentages (we temporarily put it apart), 
> of giving a name to such a class of objects (Lexicalization would be 
> fine, there is just that inconvenient of the ambiguity with 
> “lexicalizations” as objects and the set of objects having the same 
> name) and how to structure it in OWL (point 2 below). In this sense, 
> modulo the count/percentage and the finer grain on resource type cuts, 
> your theoretical formulation is taken into account by the OWL 
> formulation we already did.
>
> Let’s get a look at the last representation (version 2 of the Lime 
> Proposal PDF).
>
> /** inside the void file of the Lexicalization
>
> myItLex:myItalianLexicalizationOfDat
>
>   a lime:Lexicalization;
>
>   lime:lang "it"; // important to be here, this is the focus of search 
> by agents!!! Not the lexicon!
>
>   lime:lexicalizedDataset :dat ;
>
>   lime:lexicalModel ontolex: ;
>   lime:lexicon :italianWordnet;
>   lime:resourceCoverage [  // see discussion later in sections 5
>     lime:class owl:Class;
>     lime:percentage 0.75;
>     lime:avgNumOfEntries 3.5
>   ].
>
> In there, you have the language, ontology (lime:lexicalizedDataset), 
> lime:lexicalModel and a pointer to a Lexicon. The only added thing is 
> the resourceCoverage, which is giving the (possibility to create an) 
> additional cut on specific resources. The basic option would be 
> rdfs:Resource, to cover everything.
>
> More on the call (I’ll be there), however, as usual, the mail can 
> serve as a basis for discussion,
>
> Cheers,
>
>
> Armando
>

-- 
--
Prof. Dr. Philipp Cimiano
AG Semantic Computing
Exzellenzcluster für Cognitive Interaction Technology (CITEC)
Universität Bielefeld

Tel: +49 521 106 12249
Fax: +49 521 106 6560
Mail: cimiano@cit-ec.uni-bielefeld.de

Office CITEC-2.307
Universitätsstr. 21-25
33615 Bielefeld, NRW
Germany
Received on Friday, 27 June 2014 12:19:01 UTC