Re: One lexical entry with multiple POSes from Christian Chiarcos on 2023-07-04 (public-ontolex@w3.org from July 2023)

From: Christian Chiarcos <christian.chiarcos@gmail.com>
Date: Tue, 4 Jul 2023 13:25:53 +0200
To: Jorge Gracia del Río <jogracia@unizar.es>
Cc: Fahad Khan <fahad.khan@ilc.cnr.it>, public-ontolex <public-ontolex@w3.org>
Message-ID: <CAC1YGdhaP2RJsWSUFBQ5srLXpOxRt=OdvPrcN-iNj-Wz0ggxCg@mail.gmail.com>

Dear Jorge,

Am Di., 4. Juli 2023 um 12:19 Uhr schrieb Jorge Gracia del Río <
jogracia@unizar.es>:

> From my side, I fully support Ilan's view on this. Trying to adapt the
> model to the restrictions and needs of every single dictionary is not
> feasible.
>

Of course not. This example is a particularly nasty one, indeed, because
there is no structural unit (lexicog:Component) we could directly identify
with a sub-entry for a particular POS (instead, there are multiple such
structural units). But multiple or underspecified POSes are a frequently
recurring issue. In RDF semantics, underspecified POSes are actually not a
problem because of the open world assumption, and Lexicog can handle
multiple POSes. The nasty part here is that when using Lexicog, we simply
cannot automatically create ontolex:LexicalEntries for the POS-specific
entries because it is hard to tell (for a converter) which part of the
description applies to one, the other or both.

Of course I am in favour of adaptations to the model and to work on its
> evolution, but we need to be cautions and not to re-interpret the model to
> adapt it to any possible legacy dictionary (e.g., by moving POS from the
> Lexical Entry to the Form), due to the risk of hampering interoperability
> across a plethora of existing and future lemon-based lexical data.
>

Indeed. But note that there is nothing in OntoLex that ties
lexinfo:partOfSpeech exclusively to lexical entries.
https://www.w3.org/2016/05/ontolex/#linguistic-description merely states
for lexinfo:partOfSpeech and other subproperties of
lexinfo:morphosyntacticProperty that "By default, it should be assumed that
a property of a lexical entry also holds for all its forms." I take this to
mean that lexinfo:partOfSpeech is optional for forms, but not forbidden. In
LexInfo, neither morphosyntacticProperty nor its subproperty partOfSpeech
are given a domain.


> In your particular case, I'd go for the lexicog solution with one lexical
> entry per POS, and duplicating lexical senses if needed (which actually
> won't be duplicated since they will be connecting different things). Or,
> curating the source data to avoid existing imprecisions.
>

I think manual curation is not an option, because we'd try to have a
faithful representation first, before reinterpreting it (which needs a fair
command of French). And that would be needed for basically every entry, so
we talk about weeks to months of work (weeks in this case, it's small).
Without that manual curation, the lexicog solution with one lexical entry
per POS means to have one lexicog:Entry without any lexical entry (which is
ok from a modelling perspective, just incomplete data, as incomplete as the
original data). Right now, that would be my preference, too.

In this particular case, the nominal sense has no definition, at all. So, "Qui
témoigne d’un manque d’intelligence, de jugement, de savoir-faire, qui est
niais, bête, idiot" as given with the adjective is expected to be applied
here, as well. (I think, "idiot" must be a noun, so that's actually a
nominal definition, isn't it?) So, that part needs to be duplicated, at
least, likewise, the list of synonyms given in bold (so, from one link
between innocent-innocente and epais-epaisse, we generate 18=2 x 9
lexinfo:synonym links between the different senses, we have some polynomial
growth here).

But I don't see a better alternative either.

Thank you,
Christian

Received on Tuesday, 4 July 2023 11:26:11 UTC