- From: Christian Chiarcos <christian.chiarcos@gmail.com>
- Date: Mon, 3 Jul 2023 16:47:26 +0200
- To: public-ontolex <public-ontolex@w3.org>
- Message-ID: <CAC1YGdijHjzNmi26cSxiMGzY1tAydzTcM35J5j=RAtc6wMk9pA@mail.gmail.com>
Dear all, TL;DR: How to model the multi-POS entry https://www.dhfq.org/article/innocent-innocente with OntoLex? Long: OntoLex postulates a constraint for having one part-of-speech per lexical entry. For ontology lexicalization, this makes a lot of sense, but in the past, there also were some controversies because it partially clashes with the structure of real-world dictionaries (i.e., dictionary entries with more than one part of speech). The lexicog module introduced a possible solution, i.e., that multiple lexical entries can be grouped together into a single lexicog:Entry. This still has some downsides (e.g., sense definitions applicable to multiple parts of speech must be duplicated, because there must not be more than one lexical entry per sense), but works pretty well, as long as the original entry is actually structured in accordance to these parts of speech. In a course with students, we are currently exploring the applicability of OntoLex to a number of reference dictionaries from different Romance languages, and I would like to share one example that I consider critical, because it does not provide the neat partitioning of sub-entries into parts of speech, but conflates/switches between them several times. The entry "innocent-innocente" in the *Dictionnaire historique du français québécois* [1][2] describes both an adjective and a noun, with some portions applying to both POSes, others applying to one or the other POS, only, and while a human may be trained to disentangle them, I see no way how this could be automatized. I think OntoLex, if understood as the reference vocabulary for machine-readable lexical data in RDF, should be capable of representing such data without requiring human re-interpretation. Given the vocabularies we currently have, what would be your preferences? My current approach would be to create a lexicog:Entry, to link it with forms and senses, but to *not create a ontolex:LexicalEntry* at all. This is in line with the open world assumption, and if this is a practice we can agree upon, we should probably add that as a clarification note to the Lexicog definition. Note that this entails that lexicog:Entry can also organize ontolex:Forms (this is not prohibited right now in lexicog, but also not mentioned as a possibility). (I'm personally more in favour of lifting the one-POS-per-entry constraint, because it leads to a more efficient modelling, but that hasn't found much support so far, and using lexicog:Entrys instead of ontolex:LexicalEntries wouldn't contradict any existing vocabulary.) (NB: TEI provides something like a solution here, by the markable <entryFree> [3], but IMHO this would not be a good idea in the context of OntoLex, because the definition "a single unstructured entry" basically states that we leave the realm of well-defined semantics ... but semantics are actually very clear here, just not expressible with OntoLex core vocabulary.) Any ideas? All the best, Christian [1] https://www.dhfq.org/article/innocent-innocente (French) [2] English: https://www-dhfq-org.translate.goog/article/innocent-innocente?_x_tr_sl=auto&_x_tr_tl=de&_x_tr_hl=de&_x_tr_pto=wapp [3] https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-entryFree.html
Received on Monday, 3 July 2023 14:47:43 UTC