- From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
- Date: Fri, 27 Jun 2014 14:18:32 +0200
- To: Armando Stellato <stellato@info.uniroma2.it>, public-ontolex@w3.org
- Message-ID: <53AD6118.4030905@cit-ec.uni-bielefeld.de>
Hi Armando, yes, we agree. What I meant with my class lime:LexiconSubset is essentially what you mean when you say Lexicalization I think. So we need to agree on the name. About being more fine-granular and representing I do not think btw. that we need an OWL formalization of lime:LexiconSubset or lime:Lexicalization, we need to define what we mean with it in the NL definition of it. Might be a lengthy one though ;-) Let me slightly reformulate your example so that I can show how what you describe would be described with the vocabulary that I have in mind: myItLex:mySubset a lime:LexiconSubset; lime:lang "it"; // functional, min 1 lime:lexicalizedObject :dat ; // functional, min 1 lime:lexicalModel <http://> ; // functional, min 1 lime:lexicon :myLexicon1 ; // multi-valued lime:lexicon :myLexicon2 ; // multi-valued lime:entries 11; lime:lexicalizations 15; lime:senses 18; lime:references 20; ]. So the above would say, the subset of the dataset for the Italian language, that refers to :dat as object of lexicalization and uses lexicalModel <> contains 11 entries, 15 lexicalizations, 18 senses and 20 references. Included in this subdataset (mySubset) would be sort of implicitly all triples ext(MySubset)=(lex,sense,ref) such that LexicalEntry(lex) & LexicalSense(sense) & sense(lex,sense) & reference(sense,ref) & lang(lex)=it & definedInNameSpace(ref,:dat) & (entry(MyLexicon1,lex) OR ... entry(myLexiconN,lex)). This is not representable in OWL as we can not say that a triple (lex,sense, ref) is included in a dataset. The properties entries / lexicalizations / senses / references are then projections of the set ext(mySubset). The extention of MySubset is, however, not captured in OWL. I think we sort of agree, we need to agree on the terminology then ;-) Philipp. Am 27.06.14 12:36, schrieb Armando Stellato: > > Dear Philipp, > > all very clear, and thanks for the formal description. Replies to the > two main points here below: > > I agree on the basic premises of multiple sense referring to one > concept and that this should count as one "lexicalization". > > *//* > > *//* > > */[Armando Stellato] /* > > Perfect then, was just in doubt about the use of senses mentioned in > the prev email. > > *//* > > [CUT] > > … > > These partitions would not exist explicitly, but only implicitly, but > could be referred to explicitly by say considering an instance of > "lime:LexiconSubset" that represents one of these equivalence classes > (i.e. the equivalence class corresponding to one ontology, one > language and one ling. model). For these equivalence classes and thus > for a subset of the Lexicon, we could indicate the values of the > statistical properties mentioned above. For some equivalence class c > we could then state the following: > > entries: #{ lex : (lex,sense,ref) \in c} > senses: #{ sense: (lex,sense,ref) \in c} > lexicalizations: # { (lex,ref) : (lex,sense,ref) \in c} > references # { ref : (lex,sense,ref) \ in c} > > If we do not specify one of the three dimensions for such a slice, it > would correspond to the union of all equivalence classes for all > possible values of the unspecified dimensions. > > I hope I am more or less clear, I am saying that we need a logical > mechanism to implicitly partition a dataset into sub-datasets > according to the above mentioned three dimensions and some mechanism > to explicitly refer to these subdatasets in order to add metadata. > > This would make obsolete the classes: Lexicalization, LanguageCoverage > etc. as we could express all statistics by attaching the four basic > properties to different slices. > > Does this makes sense? If not, I will have to come up with concrete > examples I fear ;-) > > *//* > > */[Armando Stellato] /* > > Absolutely it makes sense (that is, it is sound), however, there are > two factors here: > > 1)Is this covering exactly what we need/want? > > 2)How to represent that? Maybe we more or less already agree on the > formal description above, but then it is only a matter of how to > represent that in OWL > > Regarding point 1, I list the following: > > a)there were another aspect. I would like to address the possibility > to add finer grain partitions with a further dimension, such as > “coverage of the skos:Concepts only”. This could ingenerate the fear > that there would be a combinatory explosion of possibilities, though…I > don’t see much of a typical case where there is one single monolithic > dataset containing both the ontology and its ontolex lexicalizations > for several languages. As you yourself have said in past occasions, > after all ontolex is not going to replace more trivial lexicalizations > in the “standard” bundling of ontologies/datasets, but would be > actually the language for developing great resources to provide > support in a multitude of NLP tasks. So, If someone is providing the > Spanish Ontolex Lexicalization (not relevant here if it includes the > Lexicon or not) for FOAF, he *may* want to provide these more fine > grained partitions > > b)Manuel and me split three entities (Ontology, Lexicon, > Lexicalization) to avoid redundancy and to specifically identify the > three core logical entities of OntoLex. Now, the Lexicon *should in > any case* report its entries and senses counts. The ontology already > reports its various counts in its void. I suppose indeed your > partitions need to be reified somehow, and be given a class to be used > as domain for those props. So, what you are addressing seems to me > what we called the “Lexicalization” class, nothing more, nothing > less…see point 2 here below. > > Regarding point 2, your theoretical representation needs an OWL > formulation. Obviously I’m not stuck in our one, but I didn’t see the > point in the above formulation for something different. I start from > where I left up there: your partitions in the end need a class and, > whatever you call it, it seems to be the Lexicalization class you > suggest to get rid of. And then it only amounts to the three narrow > topics of using counts or percentages (we temporarily put it apart), > of giving a name to such a class of objects (Lexicalization would be > fine, there is just that inconvenient of the ambiguity with > “lexicalizations” as objects and the set of objects having the same > name) and how to structure it in OWL (point 2 below). In this sense, > modulo the count/percentage and the finer grain on resource type cuts, > your theoretical formulation is taken into account by the OWL > formulation we already did. > > Let’s get a look at the last representation (version 2 of the Lime > Proposal PDF). > > /** inside the void file of the Lexicalization > > myItLex:myItalianLexicalizationOfDat > > a lime:Lexicalization; > > lime:lang "it"; // important to be here, this is the focus of search > by agents!!! Not the lexicon! > > lime:lexicalizedDataset :dat ; > > lime:lexicalModel ontolex: ; > lime:lexicon :italianWordnet; > lime:resourceCoverage [ // see discussion later in sections 5 > lime:class owl:Class; > lime:percentage 0.75; > lime:avgNumOfEntries 3.5 > ]. > > In there, you have the language, ontology (lime:lexicalizedDataset), > lime:lexicalModel and a pointer to a Lexicon. The only added thing is > the resourceCoverage, which is giving the (possibility to create an) > additional cut on specific resources. The basic option would be > rdfs:Resource, to cover everything. > > More on the call (I’ll be there), however, as usual, the mail can > serve as a basis for discussion, > > Cheers, > > > Armando > -- -- Prof. Dr. Philipp Cimiano AG Semantic Computing Exzellenzcluster für Cognitive Interaction Technology (CITEC) Universität Bielefeld Tel: +49 521 106 12249 Fax: +49 521 106 6560 Mail: cimiano@cit-ec.uni-bielefeld.de Office CITEC-2.307 Universitätsstr. 21-25 33615 Bielefeld, NRW Germany
Received on Friday, 27 June 2014 12:19:01 UTC