Re: LIME Final Model

Dear John, All

see my answers below.

2015-01-23 15:48 GMT+01:00 John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>:

>
>
> On Fri, Jan 23, 2015 at 3:17 PM, Manuel Fiorelli <
> manuel.fiorelli@gmail.com> wrote:
>
>> Dear John, All
>>
>> see my answer below.
>>
>> 2015-01-23 14:59 GMT+01:00 John P. McCrae <
>> jmccrae@cit-ec.uni-bielefeld.de>:
>>
>>>
>>> On Fri, Jan 23, 2015 at 2:50 PM, Manuel Fiorelli <
>>> manuel.fiorelli@gmail.com> wrote:
>>>
>>> *7. Properties avgNumOfLexicalization, percentage, lexicalizations no
>>> longer on Lexicalization*
>>>>
>>>> This is something that (if I remember correctly) was still under
>>>> discussion. However, in the attached document I was open to the possibility
>>>> to include these properties the LexicalizationSet.
>>>>
>>>> The change you propose would dramatically change the semantics of the
>>>> model. Currently, a coverage is only a container of statistics. With your
>>>> change in place, a coverage would be a dataset, which contains (I presume)
>>>> the lexicalization triples.
>>>>
>>> OK, I think the important thing is that properties such as
>>> lexicalizations can be added to the Lexicalization, it didn't look like
>>> that from the diagram
>>>
>>> As for changing the semantics, I disagree. The lexicalization is not
>>> truly a 'dataset' in most cases as it is instead may be published as part
>>> of a lexicon (or even part of an ontology). Instead it is a dataset in the
>>> sense that it some set of triples, in this case the triples linking an
>>> ontology to a lexicon, thus for me a resource coverage is also a dataset,
>>> that is the set of triples linking a lexicon to a selection of the
>>> ontology's entities by type.
>>>
>>
>> In the model, we have the following axiom
>>
>> lime:LexicalizationSet rdfs:subClass void:Dataset
>>
>> therefore, each lexicalizationSet is a dataset, in the sense of being a
>> set of triples, i.e. representing the association between ontology entities
>> and lexical entries.
>>
>> As you argue, it may be a subset of another dataset. On this last point,
>> maybe we were a bit ambiguous in previous telcos/emails. Suppose that I
>> want to distribute an ontolex:Lexicon together with a
>> lime:LexicalizationSet, what is the appropriate structure of the data?
>>
>> a)
>>
>>
>> *The lexicon also contains the triples related to the lexicalizationSet*
>> :myLexicon a ontolex:Lexicon .
>> :myLexicon void:subset :myLexicalizationSet .
>>
>> :myLexicalizationSet a lime:LexicalizationSet.
>>
>> b)
>>
>> *The lexicon does not contain the triples related to the lexicalization;
>> instead, both the lexicon and the lexicalizationSet are part of a larger
>> dataset.*
>>
>> :myDataset a void:Dataset .
>> :myDataset void:subset :myLexicon .
>> :myDataset void:subset :myLexicalizationSet .
>>
>> :myLexicon a ontolex:Lexicon .
>> :myLexicalizationSet a lime:LexicaliztionSet.
>>
>>
>> I thought that we agreed on the solution b), in order to completely
>> remove "semantic" information from the lexicon. What is your position?
>>
> I think both solutions are in principle fine but would also prefer (b)...
> I'm not quite sure about the relevance here. By 'true dataset' I mean a
> collection of triples grouped together and made available as a single
> download, the semantics of VoID are much weaker making parts of a single
> download a dataset as well (although the definition
> <http://vocab.deri.ie/void#Dataset> of void:Dataset seems to be a 'true
> dataset')
>

I asked because you wrote "The lexicalization is not truly a 'dataset' in
most cases as it is instead may be published as part of a lexicon", thus
making me think you were assuming solution a)

The following example from the spec clearly allows to define a (sub)set
only for the purpose of providing metadata:

:DBpedia a void:Dataset;
    void:classPartition [
        void:class foaf:Person;
        void:entities 312000;
    ];
    void:propertyPartition [
        void:property foaf:name;
        void:triples 312000;
    ];
    .



>
> For example VoID's classPartition property, which for me is closely
> related to lime:coverage, is a subproperty of void:subset, and hence any
> class partition is thus a void:Dataset. By the same principle I would say
> that the range of lime:coverage is also a void:Dataset as it is also a
> partition of the lexicalization. We could even go further and claim
> lime:coverage ⊑ void:subset!
>
> See:
> http://www.w3.org/TR/void/#class-property-partitions
> http://vocab.deri.ie/void#classPartition
>
>
I see your point. You are suggesting that:

*LexicalizationSet* is the dataset containing all the triples related to
lexicalization
then, by means of *coverage*, you introduce a subset that only concerns
with a specific resource type. The object could be something like
*ResourceConstrainedLexicalizationSet*.

I am sure that this option was already considered and collectively
discarded during a telco. Unfortunately, I am not sure about the
motivations.

Since your proposal seems reasonable, Armando and I will discuss about it
on Monday, in order to accept or reject you proposal.

In the meantime, I want to highlight another aspect of the model I am not
sure. Did we agree on the use of ontolex:languageURI o dcterms:language for
languages expressed as resources?

-- 
Manuel Fiorelli

Received on Friday, 23 January 2015 15:50:53 UTC