Re: Entry with Multiple Part-of-Speech Values from Passarotti Marco Carlo (marco.passarotti) on 2025-11-04 (public-ontolex@w3.org from November 2025)

From: Passarotti Marco Carlo (marco.passarotti) <marco.passarotti@unicatt.it>
Date: Tue, 4 Nov 2025 11:06:55 +0000
To: Fahad Khan <anasfkhan81@gmail.com>
CC: "John P. McCrae" <john.mccrae@insight-centre.org>, public-ontolex <public-ontolex@w3.org>
Message-ID: <E31774C6-933B-4D2B-B330-2D271559BE54@unicatt.it>

Hi all,

I support the proposal of getting rid of the constraint of having a single PoS per entry.
Very often, dictionaries do not distinguish different components of a lexicographic entry per single PoS. They just report that a certain word is “adv,,prep.”. In LiLa we had several issues while linking retrodigitized dictionaries that follow such habits as for PoS.

Best,

Marco


Prof. Marco C. Passarotti
Computational Linguistics
Index Thomisticus Treebank https://itreebank.marginalia.it/
ERC Grantee, P.I. LiLa https://lila-erc.eu/ (Grant Agreement No. 769994)
CIRCSE Research Centre https://centridiricerca.unicatt.it/circse_index.html

      

Università Cattolica del Sacro Cuore
Largo Gemelli, 1
20123 Milan, Italy
marco.passarotti@unicatt.it
tel. +39-02-72342380

> Il giorno 4 nov 2025, alle ore 11:53, Fahad Khan <anasfkhan81@gmail.com> ha scritto:
> 
> Dear John, 
> IMHO the definition of Entry is too narrow (it is tied to a lexicographic source) and entails quite a complex encoding with the existence and alignment of different structural components and lexical components just to capture, e.g., the case of part of speech values associated with different senses (think of all the overhead in the case of a lexicon where this is common and the difficulty of writing SPARQL queries). The question isn't just one of providing a solution but a good one. For instance, I think David's solution of language specific categories might make interoperability between different resources more difficult and lead to a profusion of PoS categories. 
> From what I understand the necessity of having a single part of speech per entry was a necessity for certain NLP tasks, but nowadays the creation of lexicons for language documentation/retrodigitsation is a much more frequent use case in LLOD. I think it makes sense to get rid of it. 
> Cheers, 
> Fahad
> 
> Il giorno lun 3 nov 2025 alle ore 17:16 John P. McCrae <john.mccrae@insight-centre.org <mailto:john.mccrae@insight-centre.org>> ha scritto:
>> Hi all,
>> 
>> As part of the OntoLex core model changes we are looking into the issues of multiple part-of-speech values here:
>> 
>> https://github.com/ontolex/ontolex/issues/47 <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fontolex%2Fontolex%2Fissues%2F47&data=05%7C02%7Cmarco.passarotti%40unicatt.it%7Ceaffa1230c89468e7ccf08de1b908133%7Cb94f7d7481ff44a9b5886682acc85779%7C0%7C0%7C638978504614424793%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=vknRZe7q0ARC3R1Vt6%2BjJ2l%2FHl7ZuDNT0LvDOSDXo2U%3D&reserved=0>
>> 
>> In particular, this problem already appears to be solved by the use of the `Entry` class from `lexicog` or as David Lindemann suggests by using more general or language-specific categories.
>> 
>> I was wondering if there are any use cases that anyone has that are not solved by this modelling, or other comments
>> 
>> Regards,
>> John
>> 
>> PS. I will copy/summarize replies to this email to GitHub. You may also post directly to GitHub.

Attachments

text/html attachment: stored
image/png attachment: cropped-europe-flag.png
image/png attachment: cropped-erc_high_res.png
image/png attachment: cropped-lila-logo-9.png

Received on Tuesday, 4 November 2025 11:07:05 UTC