Re: LIME Final Model

On Tue, Jan 27, 2015 at 5:25 PM, Armando Stellato <stellato@info.uniroma2.it
> wrote:

> Dear John, all,
>
>
>
> good then. Almost there for this change too. Just a summarization here
> instead of commenting the previous email:
>
>
>
> 1)      It’s ok for us too to not reuse void:classPartition as we share
> your concerns on the applicability to our use case. So we can coin a
> property (subprop of void:subset) which describes this partitioning.
>
> a.       Let’s call the above property <?coverage?> for now
>
> b.      Let’s call also <?typeProperty?> the prop we originally called
> lime:class, and then changed to lime:resourceType
>
> 2)      So now, some terminology:
>
> a.       Class for the “coverage”. It’s not clear to us what’s your final
> decision on the use of a named class for it. You said you agreed on
> removing the named class to save URIs (“fewer URIs always better”) so we
> would say it’s ok to just use an anonymous class for the restriction, like
> we proposed:
> LexicalizationSet ⊑ ∀ <?coverage?>.( LexicalizationSet ⊓ =1
> <?typeProperty?>)
> but then you suggest (“I would do as follows”) again: LexicalizationSet ⊑
> ∀ <?coverage?>.(LexicalizationSet ⊓ ResourceCoverage) where
> ResourceCoverage represents our same restriction on the <?typeProperty?>.
> Our considerations here in favor of not giving it a name:
>
>                                                                i.      VoID
> does the same for its partitions. It defines the various xxxPartition props
> but then simply points to a Dataset which is implied being a subset of the
> above.
>
>                                                              ii.      In
> any case, the name ResourceCoverage is, after the move to the subset
> approach that we agreed, inappropriate. Indeed, it was a coverage before,
> because it described particular coverages of the dataset it was attached
> too. Now, the coverage statistics are directly attached to the
> LexicalizationSet (LexicalLinkSet etc..) so all LexicalizationSets are
> already expressing coverages and, as yourself said, what we are adding now
> is only the possibility to partition these sets. So, should we call it
> Partition? I would go back to the considerations in point i. then.
>
Ah... I meant remove the specific subclasses LexicalizationCoverage and
LinkSetCoverage.... removing the parent class ResourceCoverage is also an
option. I am ambivalent about this, it removes a definition from the model,
which is good, but it also makes the usage a little less clear.

> b.      Property: <?coverage?>. Same as for point ii. above. “coverage”
> is not appropriate as now it merely describes a partition. We agreed
> void:classPartition is not appropriate so, a few possibilities:
>
>                                                                i.      lime:classPartition.
> I (Armando) personally like the idea of using a same (local)name with a
> related though slightly different semantics. The reason is: it is easy to
> remember, and you are actually using it much probably in place of the other
> when using lime. The fact that the full name is formally different
> (lime:classPartition is not formally related to void:classPartition)
> completes (in my view) the approach.
>
>                                                              ii.      lime:partition.
> Ok with such a name, we are losing here the possibility to do other kind of
> partitioning. Would it be a serious loss? After all, we still have the void
> partitions for them, while for our needs probably this is the only kind of
> partitioning we need
>
I would prefer partition, as it is shorter and we can do other kind of
partitioning (in contrast to your claim)... this is as we assume that the
partitioning is not by 'classes' in the ontological sense, but by the value
of the rdf:type triple, e.g., when we have resourceType=rdfs:Property for
example we select all properties.

See lime/example5 in the final spec
<https://www.w3.org/community/ontolex/wiki/Final_Model_Specification#Lexicalization>

> c.       <?typeProperty?>. we can live with lime:resourceType, however
> after this shift, again we might reconsider:
>
>                                                                i.      lime:class
> which better evokes the analogy with the (more general) partitioning of
> void (same consideration as for lime:classPartition).
>
Let's live with resourceType for the same reason as above.

Regards,
John

>
>
> Cheers,
>
>
>
> Armando and Manuel,
>
>
>
>
>
>
>
> *From:* johnmccrae@gmail.com [mailto:johnmccrae@gmail.com] *On Behalf Of *John
> P. McCrae
> *Sent:* Tuesday, January 27, 2015 11:48 AM
> *To:* Armando Stellato
> *Cc:* public-ontolex
>
> *Subject:* Re: LIME Final Model
>
>
>
>
>
>
>
> On Mon, Jan 26, 2015 at 8:04 PM, Armando Stellato <
> stellato@info.uniroma2.it> wrote:
>
> Dear John,
>
>
>
> good thing, we more or less agree with you :-)
>
> :)
>
>
>
> Sorry in advance for the long email, but we will address a few points: why
> initially it was not agreed (by all of us) to be like that, why it could
> be, and which possibilities we propose.
>
>
>
> Just as an historical note about it not being a subset. This emerged in a
> quite old phone call (it’s not in the minutes as we only report agreed
> decisions and usually not rejections). Actually in that call we all
> speculated about this possibility, and later on all of us agreed on
> rejecting it as we preferred to have a different nature for this coverage.
> The reason is mainly that by having a clear representation for the dataset,
> and just an appendix entity for statistical information about the coverage
> (such it was at that time), there were no ambiguity on where certain
> information had to be asserted.
>
> Let’s make a short example over a LexicalizationSet:
>
>
>
> :EnglishLexicalizationSet
>
>   rdf:type lime:LexicalizationSet ;
>
>   ontolex:language "en" ;
>
>   lime:referenceDataset <
> http://www.cimiano.de/ontologies/foaf-meta#VocabularyFOAF> ;
>
>   lime:lexicalizationModel <http://www.w3.org/ns/lemon/ontolex> ;
>
>   lime:lexiconDataset :FOAFEnglishLexicon ;
>
>
>
>   lime:coverage [
>
>       lime:resourceType owl:Thing ;
>
>       lime:percentage 0.171 ;
>
>       lime:avgNumOfLexicalizations 0.197 ;
>
>   ] ;
>
>
>
>
>
> Clearly, all the information such as referenceDataset, lexicalizationModel
> and lexiconDataset are valid for the lexicalization as a whole. The
> coverage was limited to hold those simple statistics we were talking about.
>
> If we consider the coverages to be subsets (whichever property points to
> them) of the LexicalizationSet, then one would expect to find the same info
> (referenceDataset, lexicalizationModel etc…) on these subsets. However,
> property values are not “passed” from datasets to the their subsets, as
> they all represent different objects and need to be described as well.
>
>
>
> Now, let’s come to today: wrt our original proposal, there has been much
> debate about the possibility of putting many other properties (not only
> averages and percentages, but also counts) even in the coverages, which
> eventually ended in these much richer coverage representations
> which…yes…are at this point, very similar to the LexicalizationSet itself.
>
> In a short, modeling the coverages (for LexicalLinkSets,
> LexicalizationSets…and maybe Conceptualizations) as objects of the same
> nature of their containers (actually subsets of them) is, at this point,
> surely better.
>
> However, there is still the same issue we addressed before: the
> non-inheritability of property values to the subsets.
>
> Yes, it would be possible to add axioms like:
>
> sparqlEndpoint ⊑ subset ∘ sparqlEndpoint
>
> As far as I can see this only affects lexicalizationModel... the other
> properties (references, lexicalizations, avgNumOfLexicalizations,
> percentage, lexicalEntries(?), concepts, links, avgNumOfLinks) are clearly
> not inheritable.
>
>
>
> However…in the end…this is really the same issue which exists in VoID (and it seems to be not addressed that much there). For instance, in a void:Dataset and its subsets, shouldn’t the SPARQL endpoint be the same? We observed the list of properties there, and took some examples. It seems there are quite loose semantics and more “best practices of interpretation”. Just to provide two different cases: you may find void:dataDump respecified in the datasets, describing the files containing the specific triples of the subsets, while the SPARQL endpoint is generally assumed to be the one of the containing dataset. But nothing in the model clarifies this.
>
> If we are happy with keeping the same loose semantics (and considering the larger amount of shared information between coverages and their containers wrt the original proposal), then why not? We can go for a subset approach.
>
>
>
> So, if we go for the subset approach, we suggest a few modifications to
> your proposal, as we originally discussed in that call:
>
>
>
> 1)      Do not coin a dedicated class for the coverages. Just keep the
> container (LexicalizationSet, LexicalLinkSet..and again…maybe
> Conceptualization, but we’ll discuss this in a separate thread) and assume
> that they have the same properties (with all the semantically loose
> assumptions about the inheritance of prop-values)
>
> I like this... using fewer URIs is always better
>
> 2)      Use a property to address the partition. In this case, why not
> simply reusing void:classPartition?
>
> The classPartition refers to a subset of individuals that have a given
> rdf:type. The coverage currently refers to all senses and entries with a
> reference or denotes link to an entity of the given rdf:type.
>
> These are two quite different things, right?
>
>
>
> Concerning point 2, observe that void itself is not providing that many
> axioms, but if you like to write them, we could define:
>
>
>
> LexicalizationSet ⊑ ∀ classPartition.( LexicalizationSet ⊓ =1 void:class)
>
> Sure, I would do this as follows, but it is more-or-less the same:
>
>
>
> LexicalizationSet ⊑ ∀ coverage.(LexicalizationSet ⊓ ResourceCoverage)
>
> ResourceCoverage ≡ =1 resourceType
>
>
>
> Analogous axioms hold for LexicalLinkSet ( and again maybe
> Conceptualization)
>
>
>
> Finally, property chains could be defined to make the subsets inherit the
> values of their supersets, though only for object properties…
>
> Yeah I had the same idea ;)
>
>
>
> Cheers,
>
>
>
> Armando and Manuel
>
>
>
> P.S: on the use of void:classPartitition. The description in the void
> specs (http://www.w3.org/TR/void/#class-property-partitions) there is not
> totally clear. For sure property partitions indicate subsets containing
> triples exclusively featuring a given property as their predicate. However,
> for void;classPartition, its definition mentions “descriptions of instances
> of the given class”, which we do not know if it is meant to be interpreted
> as triples with “those instances in the subject”, or a wider interpretation
> of everything related to them. The stricter interpretation has a problem:
> trivially, the direction of the predicates used in the LexicalizationSet
> (the reference dataset objects could be the objects of ontolex:denotes
> triples)
>
> Hmm... yeah it still stretches the definition of class partition too far
> for me to use it in place of resource coverage... the problem is that the
> LexicalSense object has its own class and so the link from the LexicalSense
> to the LexicalEntry is clearly not part of the class partition.
>
> Regards,
> John
>
>
>
>
>
>
>
>
>
> *From:* johnmccrae@gmail.com [mailto:johnmccrae@gmail.com] *On Behalf Of *John
> P. McCrae
> *Sent:* Friday, January 23, 2015 4:57 PM
> *To:* Manuel Fiorelli
> *Cc:* public-ontolex; Armando Stellato
> *Subject:* Re: LIME Final Model
>
>
>
> OK, one more thing that I think I have not made clear yet. The motivation
> for this is that it makes it easier to understand that all properties that
> can be stated about a Lexicalization can also be stated about a
> LexicalizationCoverage. If one is a subset of the other this is more
> obvious and uses one axiom to express what otherwise requires many axioms.
>
> For the language question, we agreed on dcterms:language:
> http://www.w3.org/2014/10/17-ontolex-minutes.html
>
> Regards,
> John
>
>
>
> On Fri, Jan 23, 2015 at 4:50 PM, Manuel Fiorelli <
> manuel.fiorelli@gmail.com> wrote:
>
> Dear John, All
>
> see my answers below.
>
>
>
> 2015-01-23 15:48 GMT+01:00 John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de
> >:
>
>
>
>
>
> On Fri, Jan 23, 2015 at 3:17 PM, Manuel Fiorelli <
> manuel.fiorelli@gmail.com> wrote:
>
> Dear John, All
>
> see my answer below.
>
>
>
> 2015-01-23 14:59 GMT+01:00 John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de
> >:
>
>
>
> On Fri, Jan 23, 2015 at 2:50 PM, Manuel Fiorelli <
> manuel.fiorelli@gmail.com> wrote:
> *7. Properties avgNumOfLexicalization, percentage, lexicalizations no
> longer on Lexicalization*
>
> This is something that (if I remember correctly) was still under
> discussion. However, in the attached document I was open to the possibility
> to include these properties the LexicalizationSet.
>
> The change you propose would dramatically change the semantics of the
> model. Currently, a coverage is only a container of statistics. With your
> change in place, a coverage would be a dataset, which contains (I presume)
> the lexicalization triples.
>
> OK, I think the important thing is that properties such as lexicalizations
> can be added to the Lexicalization, it didn't look like that from the
> diagram
>
> As for changing the semantics, I disagree. The lexicalization is not truly
> a 'dataset' in most cases as it is instead may be published as part of a
> lexicon (or even part of an ontology). Instead it is a dataset in the sense
> that it some set of triples, in this case the triples linking an ontology
> to a lexicon, thus for me a resource coverage is also a dataset, that is
> the set of triples linking a lexicon to a selection of the ontology's
> entities by type.
>
>
>
> In the model, we have the following axiom
>
> lime:LexicalizationSet rdfs:subClass void:Dataset
>
> therefore, each lexicalizationSet is a dataset, in the sense of being a
> set of triples, i.e. representing the association between ontology entities
> and lexical entries.
>
> As you argue, it may be a subset of another dataset. On this last point,
> maybe we were a bit ambiguous in previous telcos/emails. Suppose that I
> want to distribute an ontolex:Lexicon together with a
> lime:LexicalizationSet, what is the appropriate structure of the data?
>
> a)
>
> *The lexicon also contains the triples related to the lexicalizationSet*
>
> :myLexicon a ontolex:Lexicon .
> :myLexicon void:subset :myLexicalizationSet .
>
> :myLexicalizationSet a lime:LexicalizationSet.
>
>
>
> b)
>
>
>
> *The lexicon does not contain the triples related to the lexicalization;
> instead, both the lexicon and the lexicalizationSet are part of a larger
> dataset.*
>
>
> :myDataset a void:Dataset .
>
> :myDataset void:subset :myLexicon .
>
> :myDataset void:subset :myLexicalizationSet .
>
> :myLexicon a ontolex:Lexicon .
>
> :myLexicalizationSet a lime:LexicaliztionSet.
>
>
>
>
>
> I thought that we agreed on the solution b), in order to completely remove
> "semantic" information from the lexicon. What is your position?
>
> I think both solutions are in principle fine but would also prefer (b)...
> I'm not quite sure about the relevance here. By 'true dataset' I mean a
> collection of triples grouped together and made available as a single
> download, the semantics of VoID are much weaker making parts of a single
> download a dataset as well (although the definition
> <http://vocab.deri.ie/void#Dataset> of void:Dataset seems to be a 'true
> dataset')
>
>
>
> I asked because you wrote "The lexicalization is not truly a 'dataset' in
> most cases as it is instead may be published as part of a lexicon", thus
> making me think you were assuming solution a)
>
> The following example from the spec clearly allows to define a (sub)set
> only for the purpose of providing metadata:
>
> :DBpedia a void:Dataset;
>
>     void:classPartition [
>
>         void:class foaf:Person;
>
>         void:entities 312000;
>
>     ];
>
>     void:propertyPartition [
>
>         void:property foaf:name;
>
>         void:triples 312000;
>
>     ];
>
>     .
>
>
>
>
>
> For example VoID's classPartition property, which for me is closely
> related to lime:coverage, is a subproperty of void:subset, and hence any
> class partition is thus a void:Dataset. By the same principle I would say
> that the range of lime:coverage is also a void:Dataset as it is also a
> partition of the lexicalization. We could even go further and claim
> lime:coverage ⊑ void:subset!
>
> See:
> http://www.w3.org/TR/void/#class-property-partitions
> http://vocab.deri.ie/void#classPartition
>
>
>
>
>
> I see your point. You are suggesting that:
>
> *LexicalizationSet* is the dataset containing all the triples related to
> lexicalization
> then, by means of *coverage*, you introduce a subset that only concerns
> with a specific resource type. The object could be something like
> *ResourceConstrainedLexicalizationSet*.
>
> I am sure that this option was already considered and collectively
> discarded during a telco. Unfortunately, I am not sure about the
> motivations.
>
> Since your proposal seems reasonable, Armando and I will discuss about it
> on Monday, in order to accept or reject you proposal.
>
> In the meantime, I want to highlight another aspect of the model I am not
> sure. Did we agree on the use of ontolex:languageURI o dcterms:language for
> languages expressed as resources?
>
> --
>
> Manuel Fiorelli
>
>
>
>
>

Received on Tuesday, 27 January 2015 16:41:35 UTC