W3C home > Mailing lists > Public > public-ontolex@w3.org > June 2014

Re: R: Comments on lime.owl

From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Date: Thu, 19 Jun 2014 22:43:47 +0200
Message-ID: <53A34B83.4090203@cit-ec.uni-bielefeld.de>
To: public-ontolex@w3.org
Hi Armando, all,

   I am not sure I did get all your comments, sorry. I propose we do a 
fresh start. I realize that some of my comments were a bit misleading.

In my understanding, we merely need some way to (conceptually) slice a 
ontolex dataset (being it a full lexicon or a set of links only) and to 
refer to this slice as a dataset to which we can attach corresponding 
metadata.

I agree on the basic premises of multiple sense referring to one concept 
and that this should count as one "lexicalization". What I am saying is 
that I do not thing we need to make the concept of lexicalization really 
explicit other than in the counts we provide, i.e.

entries:  #{ lex : (lex,sense,ref) \in o}
senses:  #{ sense: (lex,sense,ref) \in o}
lexicalizations: # { (lex,ref) : (lex,sense,ref) \in o}
references # { ref : (lex,sense,ref) \ in o}

I am proposing that we introduce some mechanism to define the scope of 
the above counts with three relevant dimensions: language, 
ontology/dataset and linguistic model. I think that would solve most of 
our issues.

Let's say that we introduce a partition of the t=(lex,sense,ref) triples 
in the dataset (no matter for now where they are and whether they are 
spread over different resources / datasets) using the following 
equivalence relation:

t1 \equiv t2 iff ref(t1) =  ref(t2) && lang(lex(t1))=lang(lex(t2)) && 
lingmodel(t1) = lingmodel(t2)

This partitions the (lex,sense,ref) conceptually into equivalent classes 
according to the three dimensions.

These partitions would not exist explicitly, but only implicitly, but 
could be referred to explicitly by say considering an instance of 
"lime:LexiconSubset" that represents one of these equivalence classes 
(i.e. the equivalence class corresponding to one ontology, one language 
and one ling. model). For these equivalence classes and thus for a 
subset of the Lexicon, we could indicate the values of the statistical 
properties mentioned above. For some equivalence class c we could then 
state the following:


entries:  #{ lex : (lex,sense,ref) \in c}
senses:  #{ sense: (lex,sense,ref) \in c}
lexicalizations: # { (lex,ref) : (lex,sense,ref) \in c}
references # { ref : (lex,sense,ref) \ in c}


If we do not specify one of the three dimensions for such a slice, it 
would correspond to the union of all equivalence classes for all 
possible values of the unspecified dimensions.

I hope I am more or less clear, I am saying that we need a logical 
mechanism to implicitly partition a dataset into sub-datasets according 
to the above mentioned three dimensions and some mechanism to explicitly 
refer to these subdatasets in order to add metadata.

This would make obsolete the classes: Lexicalization, LanguageCoverage 
etc. as we could express all statistics by attaching the four basic 
properties to different slices.

Does this makes sense? If not, I will have to come up with concrete 
examples I fear ;-)

Best regards,

Philipp.







Am 13.06.14 20:03, schrieb Armando Stellato:
>
> Dear Philipp,
>
> Replies below.
>
> Now I find the question of "what triples belong where" difficult if 
> not impossible to answer. As John says a lexicon consists of many 
> different layers of entries, senses, references, forms. So how people 
> will package is quite arbitrary.
>
> */[Armando Stellato] /*
>
> Well, partially this is is a no-issue. We don’t need to know the 
> number of triples. In theory void allows to report the number of 
> triples of a dataset, but one can fill this info or not. If I have a 
> dataset which is both a lexicon and a lexicalization for a given 
> ontology (which is not part of this dataset) then I may report the 
> number of triples for the whole dataset, and omit this info for the 
> two subsets (Lexicon and Lexicalization).
>
> *//*
>
> Answering the question what should go where is difficult thus I find 
> as there is no natural way to package the triples in a lexicon.
>
> *//*
>
> */[Armando Stellato] /*
>
> If I understood correctly this part, this is an issue with applying 
> the model, and not the metadata model. The metadata don’t deal with 
> this. But I’m not sure I got the exact meaning of it. What do you mean 
> with package?
>
> *//*
>
> *//*
>
> The reason why we want to package things is to have metadata at the 
> level of the complete resource (Lexicon as a collection of lex entries 
> and Lexicalization as a collection of senses).
>
> */[Armando Stellato] /*
>
> This opens again an issue, which is not totally clear to me (you were 
> agreeing on my position before). I suggested not to use senses to 
> count lexicalizations (even though in many cases they correspond). See 
> related chapter in the Lime PDF we sent weeks ago. But better we check 
> it now together:
>
> If a sense is always a reified binding between a reference and a 
> lexical entry, there is no issue, and #senses=#lexicalizations. 
> But…might the following case happen?
>
> “I want to reuse WordNet, which has been modeled in Ontolex, now I’ve 
> the senses for “run”, and there are 3 very close senses which I want 
> to collapse over the same ontology reference”.
>
> Pls note that there is a large literature about rationales for doing 
> that (e.g. Navigli, 200?). If I have a system processing text and 
> using wordnet semantic structure (at sense or synset level), and I 
> want to link resultances to my ontology, then I may want to use all 
> senses which potentially fall inside my  ontology entries, and if 3 
> smoothly different senses are actually three blends of a same concept, 
> then I want to bind all of them to avoid losing some result from my 
> wordnet-bound/trained/whatever machine.
>
> So, if the above may happen (is allowed in Ontolex? Does it interfere 
> with R3?), I will have 3 bindings between “run” and a certain (same) 
> ontoreference. Then, to my purpose, this should count as one single 
> lexicalization.
>
> So, better if we seal it now :) I’m not strongly saying Ontolex should 
> allow for this multiple sense binding (though I’m suggesting it 
> should, if we want to reuse existing resources and not only creating 
> lexicons ad-hoc for each ontology) but I’m just asking you a 
> confirmation if that is the case, and then we can discuss about what 
> to count for the metadata (senses or not).
>
> *//*
>
> In addition, we want to group parts of the lexicon that refer to a 
> given dataset/ontology to say something like how many references for 
> this given ontology are lexicalized in the lexicon; fair enough. Let's 
> accept this is important (which I agree actually)
>
> So I think what we are looking for is somehting like a class 
> "SubsetOfLexiconRelevantReferringToAParticularOntology" as subclass of 
> void:Dataset and representing a slice of the overall dataset. For this 
> class we would have a property lime:dataset or lime:ontology (I 
> propose lime:vocabulary which is more neutral) that expresses the 
> ontology in question and would be functional, otherwise it makes no 
> sense. We could then attach the standard metadata properties capturing 
> statistics to the Lexicon as a whole, to the whole dataset (comprising 
> possibly many lexica) or to the subset lex / sense / ref triples with 
> a particular "ref". So 
> SubsetOfLexiconRelevantReferringToAParticularOntology(o) as a function 
> would refer as a Dataset to all the lex / sense / ref triples here ref 
> is in o. Fair enough. We could have this class but never say 
> explicitly which triples belong to it, but keep it as some implicit 
> subset of the lexicon. This would allow us to make metadata statements 
> about all the sense / ref triples with ref \in o. Let me refer to 
> these tuples as (lex,sense,ref) for now.
>
> The properties we want to have for the Dataset 
> SubsetOfLexiconRelevantReferringToAParticularOntology(o) are:
>
> entries:  #{ lex : (lex,sense,ref) \in o}
> senses:  #{ sense: (lex,sense,ref) \in o}
> lexicalizations: # { (lex,ref) : (lex,sense,ref) \in o}
> references # { ref : (lex,sense,ref) \ in o}
>
> So a question to you all:
>
> Is "SubsetOfLexiconRelevantForAParticularOntology" the kind of thing 
> we want to have to pick out that subset of the overall dataset that 
> refers to a given ontology?
>
> If yes, we can discuss the name further, but we need to agree on the 
> concept.
>
> *//*
>
> */[Armando Stellato] /*
>
> That’s where I leave it unreplied for now. Honestly, I didn’t think 
> about it. To me, the lexicalization was enough as, being a glueing 
> factor between the ontology and the lexicon (which may exist per se), 
> it informs about the coverage of the referenced ontology. It is of no 
> purpose to know how much of the original lexicon has been used (though 
> this can be reported as part of the lexicaization as well). So, my 
> first answer after just smelling the thing would be: “no, I think it 
> overcomplexifies and gives very few in return”. But…let me digest it 
> (or I’m all ears if you have further support for it).
>
> Btw, would say no to “lime:vocabulary”. While ontology is an ambiguous 
> words (model, model+data, OWL, sometimes SKOS too…) vocabulary is more 
> or less unambiguously denoting an OWL ontology (the model only, not 
> the data).
>
> *//*
>
> The other issue is that we might have lexical resources that do not 
> introduce any lexical entries but only link lexical entries in one 
> resource to entities in some ontology, but do not contain any lexicon 
> nor lexical entries themselves.
>
> Clearly, there might be hybrids in the general case, resources that 
> both link only but also introduce some lexical entries. This might be 
> the standard case.
>
> So not sure if we want to specifically tag resources that *only* link 
> but do not introduce lexical entries nor lexica themselves. We can do 
> it and call these type of datasets: LemonLinkSet while we could call 
> the other ones simply "LemonDataset". If people feel this is important 
> we can certainly do it, but I do not yet see the added value that clearly.
>
> */[Armando Stellato] /*
>
> Mmmm…were you referring to the LexicalLinkSet I was suggesting? 
> (consider this was just a possibility, not really “concrete” for now). 
> To recap on this:
>
> My “stable part” of the proposal is:
>
> 1)No need to tag anything at the data level, metadata can tell what is 
> a Lexicon and Lexicalization
>
> 2)Lexicalization exists independently of the scenario. So, whether the 
> data exists as a global onto+lexicalization+lexicon, or in any of the 
> various possible splits (there should overall four possibilities: 2+1, 
> 1+2, 3, 1+1+1), the structure of the lexicalization in the metadata is 
> the same. This looks pretty clean.
>
> a.Corollary: No need to tag pure lexicalizations in any way (even at 
> metadata) as it is of no purpose for the metadata
>
> Note that the lexicalization structure in the metadata is good for 
> representing Lemon lexicalizations, but also mere rdfs:label or skos 
> or skosxl lexicalizations.
>
> However, there is another aspect which may be relevant: suppose that 
> I’m (implicitly) lexicalizing an ontology by writing links between 
> LexicalConcepts of WordNet (synsets) and the resources of the 
> ontology. This is made through links between semantic entities on both 
> sides; we have the properties in ontolex for that and I assume thus 
> this is relevant for our model, and then probably it would be 
> important to tell it somehow in the metadata. That’s where I suggested 
> this LexicalLinkSet.
>
> Ok, so far so good. For me there is a clear picture emerging, but we 
> need to agree on it ;-)
> The other things are minor details that will follow from our stance 
> towards the issues I mentioned above I think.
>
> */[Armando Stellato]/*
>
> Sure!
>
> Cheers,
>
> Armando
>


-- 

Prof. Dr. Philipp Cimiano

Phone: +49 521 106 12249
Fax: +49 521 106 12412
Mail: cimiano@cit-ec.uni-bielefeld.de

Forschungsbau Intelligente Systeme (FBIIS)
Raum 2.307
Universität Bielefeld
Inspiration 1
33619 Bielefeld
Received on Thursday, 19 June 2014 20:44:18 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:36:40 UTC