- From: Christian Chiarcos <christian.chiarcos@gmail.com>
- Date: Tue, 7 Dec 2021 16:32:48 +0100
- To: public-ontolex <public-ontolex@w3.org>
- Message-ID: <CAC1YGdibR-ufs=o3sUsQvMLFx+JE7W0YKvCxrPdRMga5-Hf+tg@mail.gmail.com>
Dear all, for different use cases, I came across the need to provide one lexical entry for multiple languages. In one group of cases (esp., etymological dictionaries), this can be circumvented by using lexicog:Entry, instead, and then point to language-specific lexical entries. (Though this is very inelegant, unnecessarily verbose and clearly a departure from/obfuscation of the original structure of the lexical resource, but technically, it is a possibility.) However, in another case (dictionaries/glossaries for cuneiform languages), we have the problem that we cannot always tell what language a text (and thus, a word) is in. This is because of the multilingual situation of Sumerian and Akkadian during the 3rd m. BC, because of the use of ideographic signs, because of the laziness of scribes to often not write morphemes, but just the stem of a word, and because of the habit of Akkadian and Hittite scibes to just write Sumerian (or Akkadian) words instead of their native tongue because these were more established in the writing tradition. Although there are phonological or morphological complements that can reveal the language, these are not systematically used, so that we have uncertainties about the language of individual words or even entire texts. However, if these texts form the basis for a glossary or dictionary, these uncertainties percolate to the glossary, especially if it is corpus-based. The Electronic Penn Sumerian dictionary thus does not distinguish Sumerian and Akkadian forms and just groups everything under the same head word and just provides Sumerian and Akkadian readings of the same sign. (The selection of texts is such that a Sumerian reading is more likely, but it is not always necessary.) In some cases in this dictionary, it is even marked that there are doubts that a word is Sumerian in the first place (http://oracc.museum.upenn.edu/epsd2/cbd/sux/o0023151.html). Such data does not allow to create distinct lexical entries for both (or, in case of Hittite texts, three) languages that would just go under the same lexicog:Entry, because we cannot decide which information (other than the possible Sumerian and Akkadian interpretations of the same Cuneiform writtenRep) belongs to which lexical entry. For this reason, we are currently considering to have language-agnostic lexical entries for a future CDLI glossary (https://cdli.ucla.edu/), where language information is provided only at the form (or even, within the writtenRep), but not at the lexical entry. Note that there is no constraint in the OntoLex core model that requires a single language per lexical entry. What OntoLex says about language is not in the core model, but in Lime: "note that all entries in the same lexicon should be in the same language and that the language of the lexicon and entry should be consistent with the language tags used on all forms". This a comment (in parenthesis, in accompanying text, and if assumed to be relevant for the definition of ontolex:LexicalEntry, in the wrong place), formulated as a recommendation and not part of any definition. If we consider this statement to be nevertheless binding, the CDLI solution would be to create a dictionary with senses and lexicog:Entrys, but without ontolex:Entrys. I would prefer not to. (I would still prefer to avoid multilingual lexical entries in cases in which language-specific information is provided, and thus to keep the recommendation in place, as is, but this is not the case here.) Best, Christian
Received on Wednesday, 8 December 2021 08:00:38 UTC