One lexical entry with multiple POSes from Christian Chiarcos on 2023-07-03 (public-ontolex@w3.org from July 2023)

From: Christian Chiarcos <christian.chiarcos@gmail.com>
Date: Mon, 3 Jul 2023 16:47:26 +0200
To: public-ontolex <public-ontolex@w3.org>
Message-ID: <CAC1YGdijHjzNmi26cSxiMGzY1tAydzTcM35J5j=RAtc6wMk9pA@mail.gmail.com>

Dear all,

TL;DR: How to model the multi-POS entry
https://www.dhfq.org/article/innocent-innocente with OntoLex?

Long: OntoLex postulates a constraint for having one part-of-speech per
lexical entry. For ontology lexicalization, this makes a lot of sense, but
in the past, there also were some controversies because it partially
clashes with the structure of real-world dictionaries (i.e., dictionary
entries with more than one part of speech).

The lexicog module introduced a possible solution, i.e., that multiple
lexical entries can be grouped together into a single lexicog:Entry. This
still has some downsides (e.g., sense definitions applicable to multiple
parts of speech must be duplicated, because there must not be more than one
lexical entry per sense), but works pretty well, as long as the original
entry is actually structured in accordance to these parts of speech.

In a course with students, we are currently exploring the applicability of
OntoLex to a number of reference dictionaries from different Romance
languages, and I would like to share one example that I consider critical,
because it does not provide the neat partitioning of sub-entries into parts
of speech, but conflates/switches between them several times. The entry
"innocent-innocente" in the *Dictionnaire historique du français québécois*
[1][2] describes both an adjective and a noun, with some portions applying
to both POSes, others applying to one or the other POS, only, and while a
human may be trained to disentangle them, I see no way how this could be
automatized.

I think OntoLex, if understood as the reference vocabulary for
machine-readable lexical data in RDF, should be capable of representing
such data without requiring human re-interpretation. Given the vocabularies
we currently have, what would be your preferences? My current approach
would be to create a lexicog:Entry, to link it with forms and senses, but
to *not create a ontolex:LexicalEntry* at all. This is in line with the
open world assumption, and if this is a practice we can agree upon, we
should probably add that as a clarification note to the Lexicog definition.
Note that this entails that lexicog:Entry can also organize ontolex:Forms
(this is not prohibited right now in lexicog, but also not mentioned as a
possibility).

(I'm personally more in favour of lifting the one-POS-per-entry constraint,
because it leads to a more efficient modelling, but that hasn't found much
support so far, and using lexicog:Entrys instead of ontolex:LexicalEntries
wouldn't contradict any existing vocabulary.)

(NB: TEI provides something like a solution here, by the markable
<entryFree> [3], but IMHO this would not be a good idea in the context of
OntoLex, because the definition "a single unstructured entry" basically
states that we leave the realm of well-defined semantics ... but semantics
are actually very clear here, just not expressible with OntoLex core
vocabulary.)

Any ideas?

All the best,
Christian

[1] https://www.dhfq.org/article/innocent-innocente (French)
[2] English:
https://www-dhfq-org.translate.goog/article/innocent-innocente?_x_tr_sl=auto&_x_tr_tl=de&_x_tr_hl=de&_x_tr_pto=wapp
[3] https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-entryFree.html

Received on Monday, 3 July 2023 14:47:43 UTC