R: Comments on lime.owl from Armando Stellato on 2014-06-13 (public-ontolex@w3.org from June 2014)

From: Armando Stellato <stellato@info.uniroma2.it>
Date: Sat, 14 Jun 2014 02:03:57 +0800
To: "'Philipp Cimiano'" <cimiano@cit-ec.uni-bielefeld.de>, <public-ontolex@w3.org>, <public-ontolex@w3.org>
Message-ID: <SNT407-EAS183CC65E1A8FC995B2DA9E4A0150@phx.gbl>
Dear Philipp,

 

Replies below.

 

Now I find the question of "what triples belong where" difficult if not impossible to answer. As John says a lexicon consists of many different layers of entries, senses, references, forms. So how people will package is quite arbitrary. 



[Armando Stellato] 

Well, partially this is is a no-issue. We don’t need to know the number of triples. In theory void allows to report the number of triples of a dataset, but one can fill this info or not. If I have a dataset which is both a lexicon and a lexicalization for a given ontology (which is not part of this dataset) then I may report the number of triples for the whole dataset, and omit this info for the two subsets (Lexicon and Lexicalization).

 

Answering the question what should go where is difficult thus I find as there is no natural way to package the triples in a lexicon. 

 

[Armando Stellato] 

If I understood correctly this part, this is an issue with applying the model, and not the metadata model. The metadata don’t deal with this. But I’m not sure I got the exact meaning of it. What do you mean with package?

 

 

The reason why we want to package things is to have metadata at the level of the complete resource (Lexicon as a collection of lex entries and Lexicalization as a collection of senses).



[Armando Stellato] 

This opens again an issue, which is not totally clear to me (you were agreeing on my position before). I suggested not to use senses to count lexicalizations (even though in many cases they correspond). See related chapter in the Lime PDF we sent weeks ago. But better we check it now together: 

If a sense is always a reified binding between a reference and a lexical entry, there is no issue, and #senses=#lexicalizations. But…might the following case happen?

 

“I want to reuse WordNet, which has been modeled in Ontolex, now I’ve the senses for “run”, and there are 3 very close senses which I want to collapse over the same ontology reference”.

Pls note that there is a large literature about rationales for doing that (e.g. Navigli, 200?). If I have a system processing text and using wordnet semantic structure (at sense or synset level), and I want to link resultances to my ontology, then I may want to use all senses which potentially fall inside my  ontology entries, and if 3 smoothly different senses are actually three blends of a same concept, then I want to bind all of them to avoid losing some result from my wordnet-bound/trained/whatever machine.

 

So, if the above may happen (is allowed in Ontolex? Does it interfere with R3?), I will have 3 bindings between “run” and a certain (same) ontoreference. Then, to my purpose, this should count as one single lexicalization. 

So, better if we seal it now :) I’m not strongly saying Ontolex should allow for this multiple sense binding (though I’m suggesting it should, if we want to reuse existing resources and not only creating lexicons ad-hoc for each ontology) but I’m just asking you a confirmation if that is the case, and then we can discuss about what to count for the metadata (senses or not).

 

In addition, we want to group parts of the lexicon that refer to a given dataset/ontology to say something like how many references for this given ontology are lexicalized in the lexicon; fair enough. Let's accept this is important (which I agree actually)

So I think what we are looking for is somehting like a class "SubsetOfLexiconRelevantReferringToAParticularOntology" as subclass of void:Dataset and representing a slice of the overall dataset. For this class we would have a property lime:dataset or lime:ontology (I propose lime:vocabulary which is more neutral) that expresses the ontology in question and would be functional, otherwise it makes no sense. We could then attach the standard metadata properties capturing statistics to the Lexicon as a whole, to the whole dataset (comprising possibly many lexica) or to the subset lex / sense / ref triples with a particular "ref". So SubsetOfLexiconRelevantReferringToAParticularOntology(o) as a function would refer as a Dataset to all the lex / sense / ref triples here ref is in o. Fair enough. We could have this class but never say explicitly which triples belong to it, but keep it as some implicit subset of the lexicon. This would allow us to make metadata statements about all the sense / ref triples with ref \in o. Let me refer to these tuples as (lex,sense,ref) for now.

The properties we want to have for the Dataset SubsetOfLexiconRelevantReferringToAParticularOntology(o) are:

entries:  #{ lex : (lex,sense,ref) \in o}
senses:  #{ sense: (lex,sense,ref) \in o}
lexicalizations: # { (lex,ref) : (lex,sense,ref) \in o}
references # { ref : (lex,sense,ref) \ in o}

So a question to you all: 

Is "SubsetOfLexiconRelevantForAParticularOntology" the kind of thing we want to have to pick out that subset of the overall dataset that refers to a given ontology?

If yes, we can discuss the name further, but we need to agree on the concept.

 

[Armando Stellato] 

 

That’s where I leave it unreplied for now. Honestly, I didn’t think about it. To me, the lexicalization was enough as, being a glueing factor between the ontology and the lexicon (which may exist per se), it informs about the coverage of the referenced ontology. It is of no purpose to know how much of the original lexicon has been used (though this can be reported as part of the lexicaization as well). So, my first answer after just smelling the thing would be: “no, I think it overcomplexifies and gives very few in return”. But…let me digest it (or I’m all ears if you have further support for it). 

 

Btw, would say no to “lime:vocabulary”. While ontology is an ambiguous words (model, model+data, OWL, sometimes SKOS too…) vocabulary is more or less unambiguously denoting an OWL ontology (the model only, not the data).

 

The other issue is that we might have lexical resources that do not introduce any lexical entries but only link lexical entries in one resource to entities in some ontology, but do not contain any lexicon nor lexical entries themselves. 

Clearly, there might be hybrids in the general case, resources that both link only but also introduce some lexical entries. This might be the standard case. 

So not sure if we want to specifically tag resources that *only* link but do not introduce lexical entries nor lexica themselves. We can do it and call these type of datasets: LemonLinkSet while we could call the other ones simply "LemonDataset". If people feel this is important we can certainly do it, but I do not yet see the added value that clearly.



[Armando Stellato] 

Mmmm…were you referring to the LexicalLinkSet I was suggesting? (consider this was just a possibility, not really “concrete” for now). To recap on this:

 

My “stable part” of the proposal is:

 

1)      No need to tag anything at the data level, metadata can tell what is a Lexicon and Lexicalization

2)      Lexicalization exists independently of the scenario. So, whether the data exists as a global onto+lexicalization+lexicon, or in any of the various possible splits (there should overall four possibilities: 2+1, 1+2, 3, 1+1+1), the structure of the lexicalization in the metadata is the same. This looks pretty clean.

a.      Corollary: No need to tag pure lexicalizations in any way (even at metadata) as it is of no purpose for the metadata

 

Note that the lexicalization structure in the metadata is good for representing Lemon lexicalizations, but also mere rdfs:label or skos or skosxl lexicalizations.

However, there is another aspect which may be relevant: suppose that I’m (implicitly) lexicalizing an ontology by writing links between LexicalConcepts of WordNet (synsets) and the resources of the ontology. This is made through links between semantic entities on both sides; we have the properties in ontolex for that and I assume thus this is relevant for our model, and then probably it would be important to tell it somehow in the metadata. That’s where I suggested this LexicalLinkSet.

 

Ok, so far so good. For me there is a clear picture emerging, but we need to agree on it ;-)
The other things are minor details that will follow from our stance towards the issues I mentioned above I think.



[Armando Stellato]

Sure!

 

Cheers,

 

Armando
Received on Friday, 13 June 2014 18:05:01 UTC