RE: One lexical entry with multiple POSes from Ilan Kernerman on 2023-07-04 (public-ontolex@w3.org from July 2023)

From: Ilan Kernerman <ilan@lexicala.com>
Date: Tue, 4 Jul 2023 07:04:24 +0000
To: Fahad Khan <fahad.khan@ilc.cnr.it>, Christian Chiarcos <christian.chiarcos@gmail.com>
CC: public-ontolex <public-ontolex@w3.org>
Message-ID: <AS2PR03MB90023C7AEA9E2B6733E0FCE0A82EA@AS2PR03MB9002.eurprd03.prod.outlook.com>
Dear Christian, Fahad, all,

Thank you for the comprehensive description. I support keeping the one-pos-per-entry principle – which IMHO makes Ontolex/Lexicog more thorough, consistent, open and useful, despite such constraints – and seeking solutions to specific clashes, like what you suggest.

Briefly, disambiguating pos (or senses, etc) empowers Ontolex/Lexicog, as it makes the content more “datafied”, machine-readable, interoperable. For example, for linking to another resource, sense alignment, or adding translation equivalents that differ for the pos in another language.
This might clash with “real-world dictionaries”, bringing us back to asking what is the ultimate purpose of Ontolex – to provide automated 1-to-1 replications for (often) imperfect dictionaries or try to design the utmost up-to-date semantic representation of lexical data for actual use today?

Generally speaking, a one-size-fits-all model can never satisfy everything entirely (all the time), especially language-wise, and there are always bound to be exceptions to the rule, which should be welcome and accommodated with care individually unless we want a uniform, closed world.

The macro- and micro-structure of good old dictionaries has also been determined by real constraints, such as their specific media and space limits, resulting in entries with unsystematic structures (like this, otherwise beautiful, one). If the media is the message, it will be necessary to adapt.

Moreover, I doubt multi-pos-per-entry would enable “more efficient modelling”, and how capable existing tools are for “representing such data without requiring human re-interpretation”, unless the goal were only to mirror the original entry rather than also broaden its scope.

It usually requires extra time and manual work to deal with such cases manually, but maybe it’s good that not everything can be automized, yet ;)? And perhaps some amount of duplication of senses is unavoidable and, actually, the advantages exceed the drawbacks?

One way or the other, I would consider guidelines as a means, not an end, which should be open for reconsideration if they can be improved.

Regards
Ilan


From: Fahad Khan <fahad.khan@ilc.cnr.it>
Sent: Monday, July 3, 2023 7:51 PM
To: Christian Chiarcos <christian.chiarcos@gmail.com>
Cc: public-ontolex <public-ontolex@w3.org>
Subject: Re: One lexical entry with multiple POSes

Dear Christian,
The best solution would obviously be to get rid of the one POS per lexical entry constraint (and I know of no convincing reason as to why we should keep to this any longer). But since there is some reluctance to update the guidelines except to correct minor typos, this is probably not going to happen (and also if it did then that would remove one of the big motivations for developing lexicog in the first place). However IMO there is an ambiguity as to whether lexical entries are supposed to have exactlyone POS or at most one POS. This is especially the case since as we discussed in a previous OntoLex call, affixes are also classed as lexical entries in the model and these usually aren't associated with POSs. So a third potential solution to your modelling dilemma would be indeed to assume that a lexical entry can have zero or one POS values, and not to associate any POSs with your lexical entry using lexinfo:partofspeech, but rather to use some other property to specify that the categories noun and adjective are relevant to your lexical entry (this solution has the benefit that you can continue using lexical entry with its associated axioms).
Cheers
Fahad
PS. Given the capabilities of ChatGPT I wouldn't be so sure the task you refer to couldn't be automated.

Il giorno lun 3 lug 2023 alle ore 16:52 Christian Chiarcos <christian.chiarcos@gmail.com<mailto:christian.chiarcos@gmail.com>> ha scritto:
Dear all,

TL;DR: How to model the multi-POS entry https://www.dhfq.org/article/innocent-innocente with OntoLex?

Long: OntoLex postulates a constraint for having one part-of-speech per lexical entry. For ontology lexicalization, this makes a lot of sense, but in the past, there also were some controversies because it partially clashes with the structure of real-world dictionaries (i.e., dictionary entries with more than one part of speech).

The lexicog module introduced a possible solution, i.e., that multiple lexical entries can be grouped together into a single lexicog:Entry. This still has some downsides (e.g., sense definitions applicable to multiple parts of speech must be duplicated, because there must not be more than one lexical entry per sense), but works pretty well, as long as the original entry is actually structured in accordance to these parts of speech.

In a course with students, we are currently exploring the applicability of OntoLex to a number of reference dictionaries from different Romance languages, and I would like to share one example that I consider critical, because it does not provide the neat partitioning of sub-entries into parts of speech, but conflates/switches between them several times. The entry "innocent-innocente" in the Dictionnaire historique du français québécois [1][2] describes both an adjective and a noun, with some portions applying to both POSes, others applying to one or the other POS, only, and while a human may be trained to disentangle them, I see no way how this could be automatized.

I think OntoLex, if understood as the reference vocabulary for machine-readable lexical data in RDF, should be capable of representing such data without requiring human re-interpretation. Given the vocabularies we currently have, what would be your preferences? My current approach would be to create a lexicog:Entry, to link it with forms and senses, but to *not create a ontolex:LexicalEntry* at all. This is in line with the open world assumption, and if this is a practice we can agree upon, we should probably add that as a clarification note to the Lexicog definition. Note that this entails that lexicog:Entry can also organize ontolex:Forms (this is not prohibited right now in lexicog, but also not mentioned as a possibility).

(I'm personally more in favour of lifting the one-POS-per-entry constraint, because it leads to a more efficient modelling, but that hasn't found much support so far, and using lexicog:Entrys instead of ontolex:LexicalEntries wouldn't contradict any existing vocabulary.)

(NB: TEI provides something like a solution here, by the markable <entryFree> [3], but IMHO this would not be a good idea in the context of OntoLex, because the definition "a single unstructured entry" basically states that we leave the realm of well-defined semantics ... but semantics are actually very clear here, just not expressible with OntoLex core vocabulary.)

Any ideas?

All the best,
Christian

[1] https://www.dhfq.org/article/innocent-innocente (French)
[2] English: https://www-dhfq-org.translate.goog/article/innocent-innocente?_x_tr_sl=auto&_x_tr_tl=de&_x_tr_hl=de&_x_tr_pto=wapp

[3] https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-entryFree.html
Received on Tuesday, 4 July 2023 07:04:36 UTC