Re: Teleconference on Friday from John P. McCrae on 2014-05-16 (public-ontolex@w3.org from May 2014)

From: John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>
Date: Fri, 16 May 2014 13:36:54 +0200
To: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Cc: public-ontolex <public-ontolex@w3.org>
Message-ID: <CAC5njqqhiyg=CZ7fx7aRBKWaP3uRaXUb9TxbBrsZs05eM=rR9w@mail.gmail.com>
On Fri, May 16, 2014 at 12:57 PM, Philipp Cimiano <
cimiano@cit-ec.uni-bielefeld.de> wrote:

>  Dear all,
>
>  first of all thanks to Armando and Manuel for resending their slides and
> for the very clear exposition of these slides during the last telco in
> April. That was indeed very enlightening.
>
> Given the exposition, I myself am inclined to both accept the
> Lexicalization as well as the ratios as "fist-class citizens".
>
> In any case, let me make a suggestion for what to decide today, we can
> look at the details during the telco of course, but let me try to structure
> the discussion a bit:
>
> 1) ontolex:Lexicon (recommend properties such as creator, version etc.
> from dc and dcat as recommended vocabulary to express general metadata), in
> addition to numerical properties such as: i) number of lexical entries, ii)
> number of senses, iii) number of distinct references, iv) number of
> references that have at least one sense (lexical entry), v) percentage of
> references that have at least one sense (one lexicalization so to speak),
> vi) average number of lexicalization (senses) per reference
>
> One question: is this relative to the lexicon or taking into account all
> the data elements in the lexicalized dataset
>
> 2) lime:Lexicon (lexicon as dataset), see 3 below
>
I think we should avoid using the same name for different things in
different modules. Either we say that this is the same as ontolex:Lexicon
or we need a new name

>
> with main property lime:lexicalCoverage (Armando already hinted in this
> slides that we could rename LanguageCoverage to LexicalCoverage and
> correspondingly languageCoverage to lexicalCoverage I suppose?)
>
> a LexicalCoverrage class would essentially state for each language and
> each type of lexicon ontology interface model (SKOS, lemon, RDF labels
> etc.) the number of conceptual resources covered by at least one lexical
> entry, the average number of lexical entries per conceptual resource etc.
>
> 3) Introduce lime:Lexicon and lime:Lexicalization as subclasses of
> void:Dataset in the lime module
>
> 4) I think the (sort of) agreement during our last telco was to have the
> ratios/percentages in addition to the absolute numbers as we agreed that
> the absolute numbers can not always be re-computed exactly from the ratios.
> We should reach consensus here.
>

> My opinion is that introducing a few ratio properties will simplify
> accessing this information by people who want to use the lexicon.
> Re-computing this information might be difficult sometimes; not everyone
> speaks SPARQL, not always endpoints are up etc etc. Some ontologies to not
> have endpoints, so people would need to download the data, load it into
> some OWL Api, count the number of individuals, classes etc. quite tedious
> if you are just a user of SW technology ;-) So +1 from my side to include
> some ratios then.
>
> So including this information in the lexicon might indeed be a useful
> addition.
>
> However, I see some issues about *how* to count the number of conceptual
> resources, particularly in the case that there are more than one
> "lexicalized datasets" per lexicon. In this case we might want to provide
> the information per dataset or even per domain, which blows up the
> complexity again substantially.
>
> 5) One question is whether we include *also* in the model the information
> that allows to recompute the ratios as well, that would include that we
> provide both: i) number of conceptual resources in the lexicalized
> dataset(s) - which can be more than one, and ii) number of conceptual
> resources covered by at least one lexicalization. In addition to the ratio.
>
> In this case the ratio would be redundant, so be it. In any case could
> define these properties and monitor which ones are used ;-) We could
> recommend using both the integers and the ratios as good practice.
>
On ratios, I remain completely unconvinced, if we can define the ratios
formally and we should, then we must give the formulas that calculate these
values from the absolute values, therefore it seems illogical not to give
the absolute values instead (as they have to be computed first anyway).
Allowing both absolute values and ratios is a solution, but it stinks of
"design by committee" and allows for the possibility (and in practice
likely very frequent occurence) of inconsistency in the model.

To address Armando's points:

1. If the denominator is not available, then the ratio cannot be calculated
or interpreted to begin with!
2. The lexicalizations can be provided as an absolute count as well.
Providing two absolute values for the reference satisfies *all* uses cases
compactly, the ratio only *some* use cases.
3. This is illogical: if the ratio is not obtainable from available counts,
how did you obtain it??
4. If the ontology varies that ontology provider should update their
metadata, absolute counts are more robust here. For example: If I say my
lexicon has 20 senses in ontology O (of 40 entities), then the ratio is
50%, but if O adds 10 new entries then my ratio is now wrong, but my
absolute sense count remains correct.
5. If we wish to compute c(o1)/(d(o1)+d(o2)) from c(o1)/d(o1) you must
multiply by d(o1)/(d(o1)+d(o2)). This multiplier is easy to obtain from the
absolute counts, but do you really want to include this as a property in
the model, what would you call it: 'relative ratio of xxx entities in
comparison to this ontology without its imports'??

Regards,
John

>
> If we agree on the above points, I volunteer to create a small example
> with Armando on the wiki to aid the discussion.
>
> Talk to you later anyway!
>
> Philipp.
>
>
> Am 15.05.14 18:17, schrieb Armando Stellato:
>
>  Hi Philipp,
>
>
>
> Just a short recap from Manuel and me about the only part which to us
> seemed appended: the ratio/percentage vs count. We do not report anything
> about the model as, at best of our memories, there were no objections about
> the overall structure (which does not mean it is necessarily the final one,
> and it is still open for comments).
>
>
>
> We thus updated the previous document with some considerations (also taken
> from the last ontolex call we had) and reported them in section: 5
>
>  Please, feel free to add more on the “integer side”, so we already have
> a basis for discussion tomorrow.
>
>
>
> Cheers,
>
>
>
> Armando and Manuel
>
>
>
>
>
> > -----Original Message-----
>
> > From: Philipp Cimiano [mailto:cimiano@cit-ec.uni-bielefeld.de<cimiano@cit-ec.uni-bielefeld.de>
> ]
>
> > Sent: Wednesday, May 14, 2014 9:26 PM
>
> > To: public-ontolex@w3.org
>
> > Subject: Teleconference on Friday
>
> >
>
> > Dear all,
>
> >
>
> >    I would like to call for a telco on this Friday on our regular slot:
>
> > 15:00 (CET).
>
> >
>
> > The main goal is to discuss the metadata module and come to a conclusion.
>
> >
>
> > I will send some decision points out before the meeting on Friday.
>
> >
>
> > Access details can be found here as usual:
>
> > https://www.w3.org/community/ontolex/wiki/Teleconference,_2014.16.05,<https://www.w3.org/community/ontolex/wiki/Teleconference,_2014.16.05,_15-16_pm_CET>
>
> > _15-16_pm_CET<https://www.w3.org/community/ontolex/wiki/Teleconference,_2014.16.05,_15-16_pm_CET>
>
> >
>
> > I look forward to talking to you on Friday.
>
> >
>
> > Best regards,
>
> >
>
> > Philipp.
>
> >
>
> > --
>
> >
>
> > Prof. Dr. Philipp Cimiano
>
> >
>
> > Phone: +49 521 106 12249
>
> > Fax: +49 521 106 12412
>
> > Mail: cimiano@cit-ec.uni-bielefeld.de
>
> >
>
> > Forschungsbau Intelligente Systeme (FBIIS) Raum 2.307 Universität
> Bielefeld
>
> > Inspiration 1
>
> > 33619 Bielefeld
>
>
>
>
>
> --
>
> Prof. Dr. Philipp Cimiano
>
> Phone: +49 521 106 12249
> Fax: +49 521 106 12412
> Mail: cimiano@cit-ec.uni-bielefeld.de
>
> Forschungsbau Intelligente Systeme (FBIIS)
> Raum 2.307
> Universität Bielefeld
> Inspiration 1
> 33619 Bielefeld
>
>
Received on Friday, 16 May 2014 11:37:24 UTC