Re: Entry with Multiple Part-of-Speech Values from Ilan Kernerman on 2025-11-04 (public-ontolex@w3.org from November 2025)

From: Ilan Kernerman <ilan@lexicala.com>
Date: Tue, 4 Nov 2025 11:24:15 +0000
To: Ana Salgado <anacastrosalgado@gmail.com>, "Passarotti Marco Carlo (marco.passarotti)" <marco.passarotti@unicatt.it>
CC: Fahad Khan <anasfkhan81@gmail.com>, "John P. McCrae" <john.mccrae@insight-centre.org>, public-ontolex <public-ontolex@w3.org>
Message-ID: <DU0PR03MB1011454FDA044112895453A9CA8C4A@DU0PR03MB10114.eurprd03.prod.outlook.co>

Hi all,

I would argue in favor “of having a single part of speech per entry”. Besides categorizing language components in more detail (for various language technology purposes), it is needed for cross-lingual purposes, as L2 might have different equivalents for different L1 pos.

If there is no nice and easy solution that satisfies both current (and near-future) resources and retrodigitization, and one of them must suffer, IMHO our priority should be the former.

Thanks,
Ilan


From: Ana Salgado <anacastrosalgado@gmail.com>
Date: Tuesday, 4 November 2025 at 13:17
To: Passarotti Marco Carlo (marco.passarotti) <marco.passarotti@unicatt.it>
Cc: Fahad Khan <anasfkhan81@gmail.com>, John P. McCrae <john.mccrae@insight-centre.org>, public-ontolex <public-ontolex@w3.org>
Subject: Re: Entry with Multiple Part-of-Speech Values
Hello! I agree as well. In the Dictionary of the Lisbon Academy of Sciences, the answer would be positive, but when we look at microstructures such as those in the Dictionary of the Real Academia Española, the constraints become evident: https://dle.rae.es/capital?m=form
Have a nice day,
Ana

Passarotti Marco Carlo (marco.passarotti) <marco.passarotti@unicatt.it<mailto:marco.passarotti@unicatt.it>> escreveu (terça, 4/11/2025 à(s) 11:07):
Hi all,

I support the proposal of getting rid of the constraint of having a single PoS per entry.
Very often, dictionaries do not distinguish different components of a lexicographic entry per single PoS. They just report that a certain word is “adv,,prep.”. In LiLa we had several issues while linking retrodigitized dictionaries that follow such habits as for PoS.

Best,

Marco


Prof. Marco C. Passarotti
Computational Linguistics
Index Thomisticus Treebank https://itreebank.marginalia.it/
ERC Grantee, P.I. LiLa https://lila-erc.eu/ (Grant Agreement No. 769994)
CIRCSE Research Centre https://centridiricerca.unicatt.it/circse_index.html
[cid:ii_19a4e93f49f467157241]   [cid:ii_19a4e93f49f2a498a152]    [cid:ii_19a4e93f49fbd5e2bff3]

Università Cattolica del Sacro Cuore
Largo Gemelli, 1
20123 Milan, Italy
marco.passarotti@unicatt.it<mailto:marco.passarotti@unicatt.it>
tel. +39-02-72342380


Il giorno 4 nov 2025, alle ore 11:53, Fahad Khan <anasfkhan81@gmail.com<mailto:anasfkhan81@gmail.com>> ha scritto:

Dear John,
IMHO the definition of Entry is too narrow (it is tied to a lexicographic source) and entails quite a complex encoding with the existence and alignment of different structural components and lexical components just to capture, e.g., the case of part of speech values associated with different senses (think of all the overhead in the case of a lexicon where this is common and the difficulty of writing SPARQL queries). The question isn't just one of providing a solution but a good one. For instance, I think David's solution of language specific categories might make interoperability between different resources more difficult and lead to a profusion of PoS categories.
From what I understand the necessity of having a single part of speech per entry was a necessity for certain NLP tasks, but nowadays the creation of lexicons for language documentation/retrodigitsation is a much more frequent use case in LLOD. I think it makes sense to get rid of it.
Cheers,
Fahad

Il giorno lun 3 nov 2025 alle ore 17:16 John P. McCrae <john.mccrae@insight-centre.org<mailto:john.mccrae@insight-centre.org>> ha scritto:
Hi all,

As part of the OntoLex core model changes we are looking into the issues of multiple part-of-speech values here:

https://github.com/ontolex/ontolex/issues/47

In particular, this problem already appears to be solved by the use of the `Entry` class from `lexicog` or as David Lindemann suggests by using more general or language-specific categories.

I was wondering if there are any use cases that anyone has that are not solved by this modelling, or other comments

Regards,
John

PS. I will copy/summarize replies to this email to GitHub. You may also post directly to GitHub.

Attachments

image/png attachment: cropped-europe-flag.png
image/png attachment: cropped-erc_high_res.png
image/png attachment: cropped-lila-logo-9.png

Received on Tuesday, 4 November 2025 11:24:25 UTC