Re: LIME Final Model from John P. McCrae on 2015-01-27 (public-ontolex@w3.org from January 2015)

From: John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>
Date: Tue, 27 Jan 2015 11:48:16 +0100
To: Armando Stellato <stellato@info.uniroma2.it>
Cc: public-ontolex <public-ontolex@w3.org>
Message-ID: <CAC5njqpzqgLED+_wUc4S6DQUujrWaMtujR-SC=f0YiFS5R_T2g@mail.gmail.com>
On Mon, Jan 26, 2015 at 8:04 PM, Armando Stellato <stellato@info.uniroma2.it
> wrote:

> Dear John,
>
>
>
> good thing, we more or less agree with you :-)
>
:)

>
>
> Sorry in advance for the long email, but we will address a few points: why
> initially it was not agreed (by all of us) to be like that, why it could
> be, and which possibilities we propose.
>
>
>
> Just as an historical note about it not being a subset. This emerged in a
> quite old phone call (it’s not in the minutes as we only report agreed
> decisions and usually not rejections). Actually in that call we all
> speculated about this possibility, and later on all of us agreed on
> rejecting it as we preferred to have a different nature for this coverage.
> The reason is mainly that by having a clear representation for the dataset,
> and just an appendix entity for statistical information about the coverage
> (such it was at that time), there were no ambiguity on where certain
> information had to be asserted.
>
> Let’s make a short example over a LexicalizationSet:
>
>
>
> :EnglishLexicalizationSet
>
>   rdf:type lime:LexicalizationSet ;
>
>   ontolex:language "en" ;
>
>   lime:referenceDataset <
> http://www.cimiano.de/ontologies/foaf-meta#VocabularyFOAF> ;
>
>   lime:lexicalizationModel <http://www.w3.org/ns/lemon/ontolex> ;
>
>   lime:lexiconDataset :FOAFEnglishLexicon ;
>
>
>
>   lime:coverage [
>
>       lime:resourceType owl:Thing ;
>
>       lime:percentage 0.171 ;
>
>       lime:avgNumOfLexicalizations 0.197 ;
>
>   ] ;
>
>
>
>
>
> Clearly, all the information such as referenceDataset, lexicalizationModel
> and lexiconDataset are valid for the lexicalization as a whole. The
> coverage was limited to hold those simple statistics we were talking about.
>
> If we consider the coverages to be subsets (whichever property points to
> them) of the LexicalizationSet, then one would expect to find the same info
> (referenceDataset, lexicalizationModel etc…) on these subsets. However,
> property values are not “passed” from datasets to the their subsets, as
> they all represent different objects and need to be described as well.
>
>
>
> Now, let’s come to today: wrt our original proposal, there has been much
> debate about the possibility of putting many other properties (not only
> averages and percentages, but also counts) even in the coverages, which
> eventually ended in these much richer coverage representations
> which…yes…are at this point, very similar to the LexicalizationSet itself.
>
> In a short, modeling the coverages (for LexicalLinkSets,
> LexicalizationSets…and maybe Conceptualizations) as objects of the same
> nature of their containers (actually subsets of them) is, at this point,
> surely better.
>
> However, there is still the same issue we addressed before: the
> non-inheritability of property values to the subsets.
>
Yes, it would be possible to add axioms like:

sparqlEndpoint ⊑ subset ∘ sparqlEndpoint

As far as I can see this only affects lexicalizationModel... the other
properties (references, lexicalizations, avgNumOfLexicalizations,
percentage, lexicalEntries(?), concepts, links, avgNumOfLinks) are clearly
not inheritable.

>
>
> However…in the end…this is really the same issue which exists in VoID (and it seems to be not addressed that much there). For instance, in a void:Dataset and its subsets, shouldn’t the SPARQL endpoint be the same? We observed the list of properties there, and took some examples. It seems there are quite loose semantics and more “best practices of interpretation”. Just to provide two different cases: you may find void:dataDump respecified in the datasets, describing the files containing the specific triples of the subsets, while the SPARQL endpoint is generally assumed to be the one of the containing dataset. But nothing in the model clarifies this.
>
> If we are happy with keeping the same loose semantics (and considering the larger amount of shared information between coverages and their containers wrt the original proposal), then why not? We can go for a subset approach.
>
>
>
> So, if we go for the subset approach, we suggest a few modifications to
> your proposal, as we originally discussed in that call:
>
>
>
> 1)      Do not coin a dedicated class for the coverages. Just keep the
> container (LexicalizationSet, LexicalLinkSet..and again…maybe
> Conceptualization, but we’ll discuss this in a separate thread) and assume
> that they have the same properties (with all the semantically loose
> assumptions about the inheritance of prop-values)
>
I like this... using fewer URIs is always better

> 2)      Use a property to address the partition. In this case, why not
> simply reusing void:classPartition?
>
The classPartition refers to a subset of individuals that have a given
rdf:type. The coverage currently refers to all senses and entries with a
reference or denotes link to an entity of the given rdf:type.

These are two quite different things, right?

>
>
> Concerning point 2, observe that void itself is not providing that many
> axioms, but if you like to write them, we could define:
>
>
>
> LexicalizationSet ⊑ ∀ classPartition.( LexicalizationSet ⊓ =1 void:class)
>
Sure, I would do this as follows, but it is more-or-less the same:

LexicalizationSet ⊑ ∀ coverage.(LexicalizationSet ⊓ ResourceCoverage)
ResourceCoverage ≡ =1 resourceType

>
>
> Analogous axioms hold for LexicalLinkSet ( and again maybe
> Conceptualization)
>
>
>
> Finally, property chains could be defined to make the subsets inherit the
> values of their supersets, though only for object properties…
>
Yeah I had the same idea ;)

>
>
> Cheers,
>
>
>
> Armando and Manuel
>
>
>
> P.S: on the use of void:classPartitition. The description in the void
> specs (http://www.w3.org/TR/void/#class-property-partitions) there is not
> totally clear. For sure property partitions indicate subsets containing
> triples exclusively featuring a given property as their predicate. However,
> for void;classPartition, its definition mentions “descriptions of instances
> of the given class”, which we do not know if it is meant to be interpreted
> as triples with “those instances in the subject”, or a wider interpretation
> of everything related to them. The stricter interpretation has a problem:
> trivially, the direction of the predicates used in the LexicalizationSet
> (the reference dataset objects could be the objects of ontolex:denotes
> triples)
>
Hmm... yeah it still stretches the definition of class partition too far
for me to use it in place of resource coverage... the problem is that the
LexicalSense object has its own class and so the link from the LexicalSense
to the LexicalEntry is clearly not part of the class partition.

Regards,
John

>
>
>
>
>
>
>
>
> *From:* johnmccrae@gmail.com [mailto:johnmccrae@gmail.com] *On Behalf Of *John
> P. McCrae
> *Sent:* Friday, January 23, 2015 4:57 PM
> *To:* Manuel Fiorelli
> *Cc:* public-ontolex; Armando Stellato
> *Subject:* Re: LIME Final Model
>
>
>
> OK, one more thing that I think I have not made clear yet. The motivation
> for this is that it makes it easier to understand that all properties that
> can be stated about a Lexicalization can also be stated about a
> LexicalizationCoverage. If one is a subset of the other this is more
> obvious and uses one axiom to express what otherwise requires many axioms.
>
> For the language question, we agreed on dcterms:language:
> http://www.w3.org/2014/10/17-ontolex-minutes.html
>
> Regards,
> John
>
>
>
> On Fri, Jan 23, 2015 at 4:50 PM, Manuel Fiorelli <
> manuel.fiorelli@gmail.com> wrote:
>
> Dear John, All
>
> see my answers below.
>
>
>
> 2015-01-23 15:48 GMT+01:00 John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de
> >:
>
>
>
>
>
> On Fri, Jan 23, 2015 at 3:17 PM, Manuel Fiorelli <
> manuel.fiorelli@gmail.com> wrote:
>
> Dear John, All
>
> see my answer below.
>
>
>
> 2015-01-23 14:59 GMT+01:00 John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de
> >:
>
>
>
> On Fri, Jan 23, 2015 at 2:50 PM, Manuel Fiorelli <
> manuel.fiorelli@gmail.com> wrote:
> *7. Properties avgNumOfLexicalization, percentage, lexicalizations no
> longer on Lexicalization*
>
> This is something that (if I remember correctly) was still under
> discussion. However, in the attached document I was open to the possibility
> to include these properties the LexicalizationSet.
>
> The change you propose would dramatically change the semantics of the
> model. Currently, a coverage is only a container of statistics. With your
> change in place, a coverage would be a dataset, which contains (I presume)
> the lexicalization triples.
>
> OK, I think the important thing is that properties such as lexicalizations
> can be added to the Lexicalization, it didn't look like that from the
> diagram
>
> As for changing the semantics, I disagree. The lexicalization is not truly
> a 'dataset' in most cases as it is instead may be published as part of a
> lexicon (or even part of an ontology). Instead it is a dataset in the sense
> that it some set of triples, in this case the triples linking an ontology
> to a lexicon, thus for me a resource coverage is also a dataset, that is
> the set of triples linking a lexicon to a selection of the ontology's
> entities by type.
>
>
>
> In the model, we have the following axiom
>
> lime:LexicalizationSet rdfs:subClass void:Dataset
>
> therefore, each lexicalizationSet is a dataset, in the sense of being a
> set of triples, i.e. representing the association between ontology entities
> and lexical entries.
>
> As you argue, it may be a subset of another dataset. On this last point,
> maybe we were a bit ambiguous in previous telcos/emails. Suppose that I
> want to distribute an ontolex:Lexicon together with a
> lime:LexicalizationSet, what is the appropriate structure of the data?
>
> a)
>
> *The lexicon also contains the triples related to the lexicalizationSet*
>
> :myLexicon a ontolex:Lexicon .
> :myLexicon void:subset :myLexicalizationSet .
>
> :myLexicalizationSet a lime:LexicalizationSet.
>
>
>
> b)
>
>
>
> *The lexicon does not contain the triples related to the lexicalization;
> instead, both the lexicon and the lexicalizationSet are part of a larger
> dataset.*
>
>
> :myDataset a void:Dataset .
>
> :myDataset void:subset :myLexicon .
>
> :myDataset void:subset :myLexicalizationSet .
>
> :myLexicon a ontolex:Lexicon .
>
> :myLexicalizationSet a lime:LexicaliztionSet.
>
>
>
>
>
> I thought that we agreed on the solution b), in order to completely remove
> "semantic" information from the lexicon. What is your position?
>
> I think both solutions are in principle fine but would also prefer (b)...
> I'm not quite sure about the relevance here. By 'true dataset' I mean a
> collection of triples grouped together and made available as a single
> download, the semantics of VoID are much weaker making parts of a single
> download a dataset as well (although the definition
> <http://vocab.deri.ie/void#Dataset> of void:Dataset seems to be a 'true
> dataset')
>
>
>
> I asked because you wrote "The lexicalization is not truly a 'dataset' in
> most cases as it is instead may be published as part of a lexicon", thus
> making me think you were assuming solution a)
>
> The following example from the spec clearly allows to define a (sub)set
> only for the purpose of providing metadata:
>
> :DBpedia a void:Dataset;
>
>     void:classPartition [
>
>         void:class foaf:Person;
>
>         void:entities 312000;
>
>     ];
>
>     void:propertyPartition [
>
>         void:property foaf:name;
>
>         void:triples 312000;
>
>     ];
>
>     .
>
>
>
>
>
> For example VoID's classPartition property, which for me is closely
> related to lime:coverage, is a subproperty of void:subset, and hence any
> class partition is thus a void:Dataset. By the same principle I would say
> that the range of lime:coverage is also a void:Dataset as it is also a
> partition of the lexicalization. We could even go further and claim
> lime:coverage ⊑ void:subset!
>
> See:
> http://www.w3.org/TR/void/#class-property-partitions
> http://vocab.deri.ie/void#classPartition
>
>
>
>
>
> I see your point. You are suggesting that:
>
> *LexicalizationSet* is the dataset containing all the triples related to
> lexicalization
> then, by means of *coverage*, you introduce a subset that only concerns
> with a specific resource type. The object could be something like
> *ResourceConstrainedLexicalizationSet*.
>
> I am sure that this option was already considered and collectively
> discarded during a telco. Unfortunately, I am not sure about the
> motivations.
>
> Since your proposal seems reasonable, Armando and I will discuss about it
> on Monday, in order to accept or reject you proposal.
>
> In the meantime, I want to highlight another aspect of the model I am not
> sure. Did we agree on the use of ontolex:languageURI o dcterms:language for
> languages expressed as resources?
>
> --
>
> Manuel Fiorelli
>
>
>
Received on Tuesday, 27 January 2015 10:48:45 UTC