RE: LIME Final Model from Armando Stellato on 2015-01-27 (public-ontolex@w3.org from January 2015)

From: Armando Stellato <stellato@info.uniroma2.it>
Date: Tue, 27 Jan 2015 17:25:38 +0100
To: "'John P. McCrae'" <jmccrae@cit-ec.uni-bielefeld.de>
CC: "'public-ontolex'" <public-ontolex@w3.org>
Message-ID: <DUB408-EAS287B3605FEEED45226B0136A0320@phx.gbl>
Dear John, all,

 

good then. Almost there for this change too. Just a summarization here instead of commenting the previous email:

 

1)      It’s ok for us too to not reuse void:classPartition as we share your concerns on the applicability to our use case. So we can coin a property (subprop of void:subset) which describes this partitioning. 

a.       Let’s call the above property <?coverage?> for now

b.      Let’s call also <?typeProperty?> the prop we originally called lime:class, and then changed to lime:resourceType

2)      So now, some terminology: 

a.       Class for the “coverage”. It’s not clear to us what’s your final decision on the use of a named class for it. You said you agreed on removing the named class to save URIs (“fewer URIs always better”) so we would say it’s ok to just use an anonymous class for the restriction, like we proposed: 
LexicalizationSet ⊑ ∀ <?coverage?>.( LexicalizationSet ⊓ =1 <?typeProperty?>)
but then you suggest (“I would do as follows”) again: LexicalizationSet ⊑ ∀ <?coverage?>.(LexicalizationSet ⊓ ResourceCoverage) where ResourceCoverage represents our same restriction on the <?typeProperty?>. 
Our considerations here in favor of not giving it a name:

                                                               i.      VoID does the same for its partitions. It defines the various xxxPartition props but then simply points to a Dataset which is implied being a subset of the above.

                                                             ii.      In any case, the name ResourceCoverage is, after the move to the subset approach that we agreed, inappropriate. Indeed, it was a coverage before, because it described particular coverages of the dataset it was attached too. Now, the coverage statistics are directly attached to the LexicalizationSet (LexicalLinkSet etc..) so all LexicalizationSets are already expressing coverages and, as yourself said, what we are adding now is only the possibility to partition these sets. So, should we call it Partition? I would go back to the considerations in point i. then.

b.      Property: <?coverage?>. Same as for point ii. above. “coverage” is not appropriate as now it merely describes a partition. We agreed void:classPartition is not appropriate so, a few possibilities:

                                                               i.      lime:classPartition. I (Armando) personally like the idea of using a same (local)name with a related though slightly different semantics. The reason is: it is easy to remember, and you are actually using it much probably in place of the other when using lime. The fact that the full name is formally different (lime:classPartition is not formally related to void:classPartition) completes (in my view) the approach.

                                                             ii.      lime:partition. Ok with such a name, we are losing here the possibility to do other kind of partitioning. Would it be a serious loss? After all, we still have the void partitions for them, while for our needs probably this is the only kind of partitioning we need

c.       <?typeProperty?>. we can live with lime:resourceType, however after this shift, again we might reconsider:

                                                               i.      lime:class which better evokes the analogy with the (more general) partitioning of void (same consideration as for lime:classPartition).

 

Cheers,

 

Armando and Manuel,

 

 

 

From: johnmccrae@gmail.com [mailto:johnmccrae@gmail.com] On Behalf Of John P. McCrae
Sent: Tuesday, January 27, 2015 11:48 AM
To: Armando Stellato
Cc: public-ontolex
Subject: Re: LIME Final Model

 

 

 

On Mon, Jan 26, 2015 at 8:04 PM, Armando Stellato <stellato@info.uniroma2.it <mailto:stellato@info.uniroma2.it> > wrote:

Dear John,

 

good thing, we more or less agree with you :-)

:) 

 

Sorry in advance for the long email, but we will address a few points: why initially it was not agreed (by all of us) to be like that, why it could be, and which possibilities we propose.

 

Just as an historical note about it not being a subset. This emerged in a quite old phone call (it’s not in the minutes as we only report agreed decisions and usually not rejections). Actually in that call we all speculated about this possibility, and later on all of us agreed on rejecting it as we preferred to have a different nature for this coverage. The reason is mainly that by having a clear representation for the dataset, and just an appendix entity for statistical information about the coverage (such it was at that time), there were no ambiguity on where certain information had to be asserted.

Let’s make a short example over a LexicalizationSet:

 

:EnglishLexicalizationSet

  rdf:type lime:LexicalizationSet ;

  ontolex:language "en" ;

  lime:referenceDataset <http://www.cimiano.de/ontologies/foaf-meta#VocabularyFOAF> ;

  lime:lexicalizationModel <http://www.w3.org/ns/lemon/ontolex> ;

  lime:lexiconDataset :FOAFEnglishLexicon ;    

    

  lime:coverage [

      lime:resourceType owl:Thing ;

      lime:percentage 0.171 ;

      lime:avgNumOfLexicalizations 0.197 ; 

  ] ;

 

 

Clearly, all the information such as referenceDataset, lexicalizationModel and lexiconDataset are valid for the lexicalization as a whole. The coverage was limited to hold those simple statistics we were talking about.

If we consider the coverages to be subsets (whichever property points to them) of the LexicalizationSet, then one would expect to find the same info (referenceDataset, lexicalizationModel etc…) on these subsets. However, property values are not “passed” from datasets to the their subsets, as they all represent different objects and need to be described as well.

 

Now, let’s come to today: wrt our original proposal, there has been much debate about the possibility of putting many other properties (not only averages and percentages, but also counts) even in the coverages, which eventually ended in these much richer coverage representations which…yes…are at this point, very similar to the LexicalizationSet itself.

In a short, modeling the coverages (for LexicalLinkSets, LexicalizationSets…and maybe Conceptualizations) as objects of the same nature of their containers (actually subsets of them) is, at this point, surely better.

However, there is still the same issue we addressed before: the non-inheritability of property values to the subsets.

Yes, it would be possible to add axioms like: 

sparqlEndpoint ⊑ subset ∘ sparqlEndpoint

As far as I can see this only affects lexicalizationModel... the other properties (references, lexicalizations, avgNumOfLexicalizations, percentage, lexicalEntries(?), concepts, links, avgNumOfLinks) are clearly not inheritable.

 

However…in the end…this is really the same issue which exists in VoID (and it seems to be not addressed that much there). For instance, in a void:Dataset and its subsets, shouldn’t the SPARQL endpoint be the same? We observed the list of properties there, and took some examples. It seems there are quite loose semantics and more “best practices of interpretation”. Just to provide two different cases: you may find void:dataDump respecified in the datasets, describing the files containing the specific triples of the subsets, while the SPARQL endpoint is generally assumed to be the one of the containing dataset. But nothing in the model clarifies this.
If we are happy with keeping the same loose semantics (and considering the larger amount of shared information between coverages and their containers wrt the original proposal), then why not? We can go for a subset approach.
 

So, if we go for the subset approach, we suggest a few modifications to your proposal, as we originally discussed in that call:

 

1)      Do not coin a dedicated class for the coverages. Just keep the container (LexicalizationSet, LexicalLinkSet..and again…maybe Conceptualization, but we’ll discuss this in a separate thread) and assume that they have the same properties (with all the semantically loose assumptions about the inheritance of prop-values)

I like this... using fewer URIs is always better

2)      Use a property to address the partition. In this case, why not simply reusing void:classPartition?

The classPartition refers to a subset of individuals that have a given rdf:type. The coverage currently refers to all senses and entries with a reference or denotes link to an entity of the given rdf:type. 

These are two quite different things, right?

 

Concerning point 2, observe that void itself is not providing that many axioms, but if you like to write them, we could define:

 

LexicalizationSet ⊑ ∀ classPartition.( LexicalizationSet ⊓ =1 void:class)

Sure, I would do this as follows, but it is more-or-less the same:

 

LexicalizationSet ⊑ ∀ coverage.(LexicalizationSet ⊓ ResourceCoverage)

ResourceCoverage ≡ =1 resourceType

 

Analogous axioms hold for LexicalLinkSet ( and again maybe Conceptualization)

 

Finally, property chains could be defined to make the subsets inherit the values of their supersets, though only for object properties…

Yeah I had the same idea ;) 

 

Cheers,

 

Armando and Manuel

 

P.S: on the use of void:classPartitition. The description in the void specs ( <http://www.w3.org/TR/void/#class-property-partitions> http://www.w3.org/TR/void/#class-property-partitions) there is not totally clear. For sure property partitions indicate subsets containing triples exclusively featuring a given property as their predicate. However, for void;classPartition, its definition mentions “descriptions of instances of the given class”, which we do not know if it is meant to be interpreted as triples with “those instances in the subject”, or a wider interpretation of everything related to them. The stricter interpretation has a problem: trivially, the direction of the predicates used in the LexicalizationSet (the reference dataset objects could be the objects of ontolex:denotes triples)

Hmm... yeah it still stretches the definition of class partition too far for me to use it in place of resource coverage... the problem is that the LexicalSense object has its own class and so the link from the LexicalSense to the LexicalEntry is clearly not part of the class partition.

Regards,
John

 

 

 

 

From: johnmccrae@gmail.com <mailto:johnmccrae@gmail.com>  [mailto:johnmccrae@gmail.com <mailto:johnmccrae@gmail.com> ] On Behalf Of John P. McCrae
Sent: Friday, January 23, 2015 4:57 PM
To: Manuel Fiorelli
Cc: public-ontolex; Armando Stellato
Subject: Re: LIME Final Model

 

OK, one more thing that I think I have not made clear yet. The motivation for this is that it makes it easier to understand that all properties that can be stated about a Lexicalization can also be stated about a LexicalizationCoverage. If one is a subset of the other this is more obvious and uses one axiom to express what otherwise requires many axioms.

For the language question, we agreed on dcterms:language:
http://www.w3.org/2014/10/17-ontolex-minutes.html

Regards,
John

 

On Fri, Jan 23, 2015 at 4:50 PM, Manuel Fiorelli <manuel.fiorelli@gmail.com <mailto:manuel.fiorelli@gmail.com> > wrote:

Dear John, All

see my answers below.

 

2015-01-23 15:48 GMT+01:00 John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de <mailto:jmccrae@cit-ec.uni-bielefeld.de> >:

 

 

On Fri, Jan 23, 2015 at 3:17 PM, Manuel Fiorelli <manuel.fiorelli@gmail.com <mailto:manuel.fiorelli@gmail.com> > wrote:

Dear John, All

see my answer below.

 

2015-01-23 14:59 GMT+01:00 John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de <mailto:jmccrae@cit-ec.uni-bielefeld.de> >:

 

On Fri, Jan 23, 2015 at 2:50 PM, Manuel Fiorelli <manuel.fiorelli@gmail.com <mailto:manuel.fiorelli@gmail.com> > wrote:
7. Properties avgNumOfLexicalization, percentage, lexicalizations no longer on Lexicalization

This is something that (if I remember correctly) was still under discussion. However, in the attached document I was open to the possibility to include these properties the LexicalizationSet.

The change you propose would dramatically change the semantics of the model. Currently, a coverage is only a container of statistics. With your change in place, a coverage would be a dataset, which contains (I presume) the lexicalization triples.

OK, I think the important thing is that properties such as lexicalizations can be added to the Lexicalization, it didn't look like that from the diagram

As for changing the semantics, I disagree. The lexicalization is not truly a 'dataset' in most cases as it is instead may be published as part of a lexicon (or even part of an ontology). Instead it is a dataset in the sense that it some set of triples, in this case the triples linking an ontology to a lexicon, thus for me a resource coverage is also a dataset, that is the set of triples linking a lexicon to a selection of the ontology's entities by type.

 

In the model, we have the following axiom

lime:LexicalizationSet rdfs:subClass void:Dataset

therefore, each lexicalizationSet is a dataset, in the sense of being a set of triples, i.e. representing the association between ontology entities and lexical entries.

As you argue, it may be a subset of another dataset. On this last point, maybe we were a bit ambiguous in previous telcos/emails. Suppose that I want to distribute an ontolex:Lexicon together with a lime:LexicalizationSet, what is the appropriate structure of the data?

a)

The lexicon also contains the triples related to the lexicalizationSet

:myLexicon a ontolex:Lexicon .
:myLexicon void:subset :myLexicalizationSet .

:myLexicalizationSet a lime:LexicalizationSet.

 

b)

 

The lexicon does not contain the triples related to the lexicalization; instead, both the lexicon and the lexicalizationSet are part of a larger dataset.


:myDataset a void:Dataset .

:myDataset void:subset :myLexicon .

:myDataset void:subset :myLexicalizationSet .

:myLexicon a ontolex:Lexicon .

:myLexicalizationSet a lime:LexicaliztionSet.

 

 

I thought that we agreed on the solution b), in order to completely remove "semantic" information from the lexicon. What is your position?

I think both solutions are in principle fine but would also prefer (b)... I'm not quite sure about the relevance here. By 'true dataset' I mean a collection of triples grouped together and made available as a single download, the semantics of VoID are much weaker making parts of a single download a dataset as well (although the definition <http://vocab.deri.ie/void#Dataset>  of void:Dataset seems to be a 'true dataset')

 

I asked because you wrote "The lexicalization is not truly a 'dataset' in most cases as it is instead may be published as part of a lexicon", thus making me think you were assuming solution a)

The following example from the spec clearly allows to define a (sub)set only for the purpose of providing metadata:

:DBpedia a void:Dataset;
    void:classPartition [
        void:class foaf:Person;
        void:entities 312000;
    ];
    void:propertyPartition [ 
        void:property foaf:name;
        void:triples 312000;
    ];
    .

 

 

For example VoID's classPartition property, which for me is closely related to lime:coverage, is a subproperty of void:subset, and hence any class partition is thus a void:Dataset. By the same principle I would say that the range of lime:coverage is also a void:Dataset as it is also a partition of the lexicalization. We could even go further and claim lime:coverage ⊑ void:subset!

See:
http://www.w3.org/TR/void/#class-property-partitions
http://vocab.deri.ie/void#classPartition

 

 

I see your point. You are suggesting that:

LexicalizationSet is the dataset containing all the triples related to lexicalization
then, by means of coverage, you introduce a subset that only concerns with a specific resource type. The object could be something like ResourceConstrainedLexicalizationSet.

I am sure that this option was already considered and collectively discarded during a telco. Unfortunately, I am not sure about the motivations.

Since your proposal seems reasonable, Armando and I will discuss about it on Monday, in order to accept or reject you proposal.

In the meantime, I want to highlight another aspect of the model I am not sure. Did we agree on the use of ontolex:languageURI o dcterms:language for languages expressed as resources?

-- 

Manuel Fiorelli
Received on Tuesday, 27 January 2015 16:26:19 UTC