- From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
- Date: Fri, 27 Jun 2014 14:18:32 +0200
- To: Armando Stellato <stellato@info.uniroma2.it>, public-ontolex@w3.org
- Message-ID: <53AD6118.4030905@cit-ec.uni-bielefeld.de>
Hi Armando,
yes, we agree. What I meant with my class lime:LexiconSubset is
essentially what you mean when you say Lexicalization I think. So we
need to agree on the name.
About being more fine-granular and representing
I do not think btw. that we need an OWL formalization of
lime:LexiconSubset or lime:Lexicalization, we need to define what we
mean with it in the NL definition of it. Might be a lengthy one though ;-)
Let me slightly reformulate your example so that I can show how what you
describe would be described with the vocabulary that I have in mind:
myItLex:mySubset
a lime:LexiconSubset;
lime:lang "it"; // functional, min 1
lime:lexicalizedObject :dat ; // functional, min 1
lime:lexicalModel <http://> ; // functional, min 1
lime:lexicon :myLexicon1 ; // multi-valued
lime:lexicon :myLexicon2 ; // multi-valued
lime:entries 11;
lime:lexicalizations 15;
lime:senses 18;
lime:references 20;
].
So the above would say, the subset of the dataset for the Italian
language, that refers to :dat as object of lexicalization and uses
lexicalModel <> contains 11 entries, 15 lexicalizations, 18 senses and
20 references.
Included in this subdataset (mySubset) would be sort of implicitly all
triples ext(MySubset)=(lex,sense,ref) such that LexicalEntry(lex) &
LexicalSense(sense) & sense(lex,sense) & reference(sense,ref) &
lang(lex)=it & definedInNameSpace(ref,:dat) & (entry(MyLexicon1,lex) OR
... entry(myLexiconN,lex)).
This is not representable in OWL as we can not say that a triple
(lex,sense, ref) is included in a dataset.
The properties entries / lexicalizations / senses / references are then
projections of the set ext(mySubset).
The extention of MySubset is, however, not captured in OWL.
I think we sort of agree, we need to agree on the terminology then ;-)
Philipp.
Am 27.06.14 12:36, schrieb Armando Stellato:
>
> Dear Philipp,
>
> all very clear, and thanks for the formal description. Replies to the
> two main points here below:
>
> I agree on the basic premises of multiple sense referring to one
> concept and that this should count as one "lexicalization".
>
> *//*
>
> *//*
>
> */[Armando Stellato] /*
>
> Perfect then, was just in doubt about the use of senses mentioned in
> the prev email.
>
> *//*
>
> [CUT]
>
> …
>
> These partitions would not exist explicitly, but only implicitly, but
> could be referred to explicitly by say considering an instance of
> "lime:LexiconSubset" that represents one of these equivalence classes
> (i.e. the equivalence class corresponding to one ontology, one
> language and one ling. model). For these equivalence classes and thus
> for a subset of the Lexicon, we could indicate the values of the
> statistical properties mentioned above. For some equivalence class c
> we could then state the following:
>
> entries: #{ lex : (lex,sense,ref) \in c}
> senses: #{ sense: (lex,sense,ref) \in c}
> lexicalizations: # { (lex,ref) : (lex,sense,ref) \in c}
> references # { ref : (lex,sense,ref) \ in c}
>
> If we do not specify one of the three dimensions for such a slice, it
> would correspond to the union of all equivalence classes for all
> possible values of the unspecified dimensions.
>
> I hope I am more or less clear, I am saying that we need a logical
> mechanism to implicitly partition a dataset into sub-datasets
> according to the above mentioned three dimensions and some mechanism
> to explicitly refer to these subdatasets in order to add metadata.
>
> This would make obsolete the classes: Lexicalization, LanguageCoverage
> etc. as we could express all statistics by attaching the four basic
> properties to different slices.
>
> Does this makes sense? If not, I will have to come up with concrete
> examples I fear ;-)
>
> *//*
>
> */[Armando Stellato] /*
>
> Absolutely it makes sense (that is, it is sound), however, there are
> two factors here:
>
> 1)Is this covering exactly what we need/want?
>
> 2)How to represent that? Maybe we more or less already agree on the
> formal description above, but then it is only a matter of how to
> represent that in OWL
>
> Regarding point 1, I list the following:
>
> a)there were another aspect. I would like to address the possibility
> to add finer grain partitions with a further dimension, such as
> “coverage of the skos:Concepts only”. This could ingenerate the fear
> that there would be a combinatory explosion of possibilities, though…I
> don’t see much of a typical case where there is one single monolithic
> dataset containing both the ontology and its ontolex lexicalizations
> for several languages. As you yourself have said in past occasions,
> after all ontolex is not going to replace more trivial lexicalizations
> in the “standard” bundling of ontologies/datasets, but would be
> actually the language for developing great resources to provide
> support in a multitude of NLP tasks. So, If someone is providing the
> Spanish Ontolex Lexicalization (not relevant here if it includes the
> Lexicon or not) for FOAF, he *may* want to provide these more fine
> grained partitions
>
> b)Manuel and me split three entities (Ontology, Lexicon,
> Lexicalization) to avoid redundancy and to specifically identify the
> three core logical entities of OntoLex. Now, the Lexicon *should in
> any case* report its entries and senses counts. The ontology already
> reports its various counts in its void. I suppose indeed your
> partitions need to be reified somehow, and be given a class to be used
> as domain for those props. So, what you are addressing seems to me
> what we called the “Lexicalization” class, nothing more, nothing
> less…see point 2 here below.
>
> Regarding point 2, your theoretical representation needs an OWL
> formulation. Obviously I’m not stuck in our one, but I didn’t see the
> point in the above formulation for something different. I start from
> where I left up there: your partitions in the end need a class and,
> whatever you call it, it seems to be the Lexicalization class you
> suggest to get rid of. And then it only amounts to the three narrow
> topics of using counts or percentages (we temporarily put it apart),
> of giving a name to such a class of objects (Lexicalization would be
> fine, there is just that inconvenient of the ambiguity with
> “lexicalizations” as objects and the set of objects having the same
> name) and how to structure it in OWL (point 2 below). In this sense,
> modulo the count/percentage and the finer grain on resource type cuts,
> your theoretical formulation is taken into account by the OWL
> formulation we already did.
>
> Let’s get a look at the last representation (version 2 of the Lime
> Proposal PDF).
>
> /** inside the void file of the Lexicalization
>
> myItLex:myItalianLexicalizationOfDat
>
> a lime:Lexicalization;
>
> lime:lang "it"; // important to be here, this is the focus of search
> by agents!!! Not the lexicon!
>
> lime:lexicalizedDataset :dat ;
>
> lime:lexicalModel ontolex: ;
> lime:lexicon :italianWordnet;
> lime:resourceCoverage [ // see discussion later in sections 5
> lime:class owl:Class;
> lime:percentage 0.75;
> lime:avgNumOfEntries 3.5
> ].
>
> In there, you have the language, ontology (lime:lexicalizedDataset),
> lime:lexicalModel and a pointer to a Lexicon. The only added thing is
> the resourceCoverage, which is giving the (possibility to create an)
> additional cut on specific resources. The basic option would be
> rdfs:Resource, to cover everything.
>
> More on the call (I’ll be there), however, as usual, the mail can
> serve as a basis for discussion,
>
> Cheers,
>
>
> Armando
>
--
--
Prof. Dr. Philipp Cimiano
AG Semantic Computing
Exzellenzcluster für Cognitive Interaction Technology (CITEC)
Universität Bielefeld
Tel: +49 521 106 12249
Fax: +49 521 106 6560
Mail: cimiano@cit-ec.uni-bielefeld.de
Office CITEC-2.307
Universitätsstr. 21-25
33615 Bielefeld, NRW
Germany
Received on Friday, 27 June 2014 12:19:01 UTC