R: R: Comments on lime.owl from Armando Stellato on 2014-06-27 (public-ontolex@w3.org from June 2014)

From: Armando Stellato <stellato@info.uniroma2.it>
Date: Fri, 27 Jun 2014 12:36:39 +0200
To: "'Philipp Cimiano'" <cimiano@cit-ec.uni-bielefeld.de>, <public-ontolex@w3.org>
Message-ID: <SNT407-EAS283E4C2E6F9B13285AAD55DA01B0@phx.gbl>
Dear Philipp,

 

all very clear, and thanks for the formal description. Replies to the two main points here below:

 

I agree on the basic premises of multiple sense referring to one concept and that this should count as one "lexicalization". 

 

 

[Armando Stellato] 

Perfect then, was just in doubt about the use of senses mentioned in the prev email. 

 

[CUT]

…

These partitions would not exist explicitly, but only implicitly, but could be referred to explicitly by say considering an instance of "lime:LexiconSubset" that represents one of these equivalence classes (i.e. the equivalence class corresponding to one ontology, one language and one ling. model). For these equivalence classes and thus for a subset of the Lexicon, we could indicate the values of the statistical properties mentioned above. For some equivalence class c we could then state the following:

entries:  #{ lex : (lex,sense,ref) \in c}
senses:  #{ sense: (lex,sense,ref) \in c}
lexicalizations: # { (lex,ref) : (lex,sense,ref) \in c}
references # { ref : (lex,sense,ref) \ in c}

If we do not specify one of the three dimensions for such a slice, it would correspond to the union of all equivalence classes for all possible values of the unspecified dimensions. 

I hope I am more or less clear, I am saying that we need a logical mechanism to implicitly partition a dataset into sub-datasets according to the above mentioned three dimensions and some mechanism to explicitly refer to these subdatasets in order to add metadata.

This would make obsolete the classes: Lexicalization, LanguageCoverage etc. as we could express all statistics by attaching the four basic properties to different slices. 

Does this makes sense? If not, I will have to come up with concrete examples I fear ;-)



 

[Armando Stellato] 

Absolutely it makes sense (that is, it is sound), however, there are two factors here:

1)      Is this covering exactly what we need/want?

2)      How to represent that? Maybe we more or less already agree on the formal description above, but then it is only a matter of how to represent that in OWL

Regarding point 1, I list the following: 

a)      there were another aspect. I would like to address the possibility to add finer grain partitions with a further dimension, such as “coverage of the skos:Concepts only”. This could ingenerate the fear that there would be a combinatory explosion of possibilities, though…I don’t see much of a typical case where there is one single monolithic dataset containing both the ontology and its ontolex lexicalizations for several languages. As you yourself have said in past occasions, after all ontolex is not going to replace more trivial lexicalizations in the “standard” bundling of ontologies/datasets, but would be actually the language for developing great resources to provide support in a multitude of NLP tasks. So, If someone is providing the Spanish Ontolex Lexicalization (not relevant here if it includes the Lexicon or not) for FOAF, he *may* want to provide these more fine grained partitions



b)      Manuel and me split three entities (Ontology, Lexicon, Lexicalization) to avoid redundancy and to specifically identify the three core logical entities of OntoLex. Now, the Lexicon *should in any case* report its entries and senses counts. The ontology already reports its various counts in its void. I suppose indeed your partitions need to be reified somehow, and be given a class to be used as domain for those props. So, what you are addressing seems to me what we called the “Lexicalization” class, nothing more, nothing less…see point 2 here below.

Regarding point 2, your theoretical representation needs an OWL formulation. Obviously I’m not stuck in our one, but I didn’t see the point in the above formulation for something different. I start from where I left up there: your partitions in the end need a class and, whatever you call it, it seems to be the Lexicalization class you suggest to get rid of. And then it only amounts to the three narrow topics of using counts or percentages (we temporarily put it apart), of giving a name to such a class of objects (Lexicalization would be fine, there is just that inconvenient of the ambiguity with “lexicalizations” as objects and the set of objects having the same name) and how to structure it in OWL (point 2 below). In this sense, modulo the count/percentage and the finer grain on resource type cuts, your theoretical formulation is taken into account by the OWL formulation we already did.

 

Let’s get a look at the last representation (version 2 of the Lime Proposal PDF). 

 

/** inside the void file of the Lexicalization

myItLex:myItalianLexicalizationOfDat

  a lime:Lexicalization;

  lime:lang "it";  // important to be here, this is the focus of search by agents!!! Not the lexicon!

  lime:lexicalizedDataset :dat ;

  lime:lexicalModel ontolex: ;  
  lime:lexicon :italianWordnet;
  lime:resourceCoverage [   // see discussion later in sections 5
    lime:class owl:Class;
    lime:percentage 0.75;
    lime:avgNumOfEntries 3.5
  ].

In there, you have the language, ontology (lime:lexicalizedDataset), lime:lexicalModel and a pointer to a Lexicon. The only added thing is the resourceCoverage, which is giving the (possibility to create an) additional cut on specific resources. The basic option would be rdfs:Resource, to cover everything.

 

More on the call (I’ll be there), however, as usual, the mail can serve as a basis for discussion,

 

Cheers,


Armando
Received on Friday, 27 June 2014 10:37:14 UTC