Re: One lexical entry with multiple POSes from Fahad Khan on 2023-07-04 (public-ontolex@w3.org from July 2023)

From: Fahad Khan <fahad.khan@ilc.cnr.it>
Date: Tue, 4 Jul 2023 16:32:09 +0200
To: Christian Chiarcos <christian.chiarcos@gmail.com>
Cc: Jorge Gracia del Río <jogracia@unizar.es>, public-ontolex <public-ontolex@w3.org>
Message-ID: <CAK+N+9iXd6YFJnxX1SHYArPq9Ow5X24cNhhL3OpeGoqvkTLwHw@mail.gmail.com>
Dear all,
Personally, I think that as long as we maintain a reasonable level of
backwards compatibility, then the creation of new, updated, versions of
ontolex should become a core part of maintaining the future usability of
the model (as is the case with TEI). Changing the one POS per entry
constraint, for instance, wouldn't affect interoperability with previous
ontolex lexicons and it would avoid ad hoc solutions which involve forcing
users to impose an interpretation on their data (something which isn't
always possible and which makes ontolex difficult to use as Christian's
original mail demonstrates). OntoLex was originally intended for a specific
sets of use cases, ended up being mostly appropriated for other use cases,
so it is natural that parts of the original model no longer make as much
sense as they previously did (who is still interested in using ontology
lexicons to carry out WSD in 2023 for instance? whereas there is an
increasing amount of interest in using ontolex for legacy dictionaries). If
you want to propose a specific pattern/SHACL shape that enforces one POS
per lexical entry, without necessarily hardcoding it into the ontolex
specifications, then of course you should absolutely do this (& this kind
of information should be added to the metadata of individual resources as a
best practice). Another issue is that we should try and make sure ontolex
is aligned with other relevant standards in order to enable crosswalks; one
of these standards is chapter 9 of TEI (used much more widely than ontolex
for encoding dictionaries), which doesn't have any restriction similar to
the one POS per entry one.
Cheers
Fahad

Il giorno mar 4 lug 2023 alle ore 13:25 Christian Chiarcos <
christian.chiarcos@gmail.com> ha scritto:

> Dear Jorge,
>
> Am Di., 4. Juli 2023 um 12:19 Uhr schrieb Jorge Gracia del Río <
> jogracia@unizar.es>:
>
>> From my side, I fully support Ilan's view on this. Trying to adapt the
>> model to the restrictions and needs of every single dictionary is not
>> feasible.
>>
>
> Of course not. This example is a particularly nasty one, indeed, because
> there is no structural unit (lexicog:Component) we could directly identify
> with a sub-entry for a particular POS (instead, there are multiple such
> structural units). But multiple or underspecified POSes are a frequently
> recurring issue. In RDF semantics, underspecified POSes are actually not a
> problem because of the open world assumption, and Lexicog can handle
> multiple POSes. The nasty part here is that when using Lexicog, we simply
> cannot automatically create ontolex:LexicalEntries for the POS-specific
> entries because it is hard to tell (for a converter) which part of the
> description applies to one, the other or both.
>
> Of course I am in favour of adaptations to the model and to work on its
>> evolution, but we need to be cautions and not to re-interpret the model to
>> adapt it to any possible legacy dictionary (e.g., by moving POS from the
>> Lexical Entry to the Form), due to the risk of hampering interoperability
>> across a plethora of existing and future lemon-based lexical data.
>>
>
> Indeed. But note that there is nothing in OntoLex that ties
> lexinfo:partOfSpeech exclusively to lexical entries.
> https://www.w3.org/2016/05/ontolex/#linguistic-description merely states
> for lexinfo:partOfSpeech and other subproperties of
> lexinfo:morphosyntacticProperty that "By default, it should be assumed that
> a property of a lexical entry also holds for all its forms." I take this to
> mean that lexinfo:partOfSpeech is optional for forms, but not forbidden. In
> LexInfo, neither morphosyntacticProperty nor its subproperty partOfSpeech
> are given a domain.
>
>
>> In your particular case, I'd go for the lexicog solution with one lexical
>> entry per POS, and duplicating lexical senses if needed (which actually
>> won't be duplicated since they will be connecting different things). Or,
>> curating the source data to avoid existing imprecisions.
>>
>
> I think manual curation is not an option, because we'd try to have a
> faithful representation first, before reinterpreting it (which needs a fair
> command of French). And that would be needed for basically every entry, so
> we talk about weeks to months of work (weeks in this case, it's small).
> Without that manual curation, the lexicog solution with one lexical entry
> per POS means to have one lexicog:Entry without any lexical entry (which is
> ok from a modelling perspective, just incomplete data, as incomplete as the
> original data). Right now, that would be my preference, too.
>
> In this particular case, the nominal sense has no definition, at all. So, "Qui
> témoigne d’un manque d’intelligence, de jugement, de savoir-faire, qui est
> niais, bête, idiot" as given with the adjective is expected to be applied
> here, as well. (I think, "idiot" must be a noun, so that's actually a
> nominal definition, isn't it?) So, that part needs to be duplicated, at
> least, likewise, the list of synonyms given in bold (so, from one link
> between innocent-innocente and epais-epaisse, we generate 18=2 x 9
> lexinfo:synonym links between the different senses, we have some
> polynomial growth here).
>
> But I don't see a better alternative either.
>
> Thank you,
> Christian
>
Received on Tuesday, 4 July 2023 14:32:28 UTC