- From: Christian Chiarcos <christian.chiarcos@gmail.com>
- Date: Wed, 8 Dec 2021 14:53:04 +0100
- To: public-ontolex <public-ontolex@w3.org>
- Message-ID: <CAC1YGdgBP1tXGt-r8GETbCk5KwUqibW0329xfeogq5c8u7Ptwg@mail.gmail.com>
Dear all, just for clarification, the following is what I would like to do: :sze_le a ontolex:LexicalEntry; ontolex:canonicalForm [ ontolex:writtenRep "𒊺"; # or: ontolex:writtenRep "𒊺"@sux-Xsux, ontolex:writtenRep "𒊺"@akk-Xsux ontolex:writtenRep "sze"; # transliteration ontolex:writtenRep "sze"@sux-Latn; # transcription ontolex:writtenRep "uţţatu"@akk-Latn # transcription ]; ontolex:sense [ rdfs:comment "unit of weight, approx 0.04 g" ]. The alternative with lexicog:Entry (and without duplicating LexicalEntries) would be :sze_le a lexicog:Entry; lexicog:describes [ a ontolex:Form; ontolex:writtenRep "𒊺"; ontolex:writtenRep "sze"; # transliteration ontolex:writtenRep "sze"@sux; # transcription ontolex:writtenRep "uţţatu"@akk # transcription ... IMHO different language tags should be unproblematic for forms ]; lexicog:describes [ a ontolex:LexicalSense; rdfs:comment "unit of weight, approx 0.04 g"]. The latter way of modelling should be in line with the documentation, but it makes large parts of OntoLex-Lemon redundant and others (e.g., canonicalForm) inapplicable, I would prefer to avoid that. Best, Christian Am Di., 7. Dez. 2021 um 16:32 Uhr schrieb Christian Chiarcos < christian.chiarcos@gmail.com>: > Dear all, > > for different use cases, I came across the need to provide one lexical > entry for multiple languages. > > In one group of cases (esp., etymological dictionaries), this can be > circumvented by using lexicog:Entry, instead, and then point to > language-specific lexical entries. (Though this is very inelegant, > unnecessarily verbose and clearly a departure from/obfuscation of the > original structure of the lexical resource, but technically, it is a > possibility.) > > However, in another case (dictionaries/glossaries for cuneiform > languages), we have the problem that we cannot always tell what language a > text (and thus, a word) is in. This is because of the multilingual > situation of Sumerian and Akkadian during the 3rd m. BC, because of the use > of ideographic signs, because of the laziness of scribes to often not write > morphemes, but just the stem of a word, and because of the habit of > Akkadian and Hittite scibes to just write Sumerian (or Akkadian) words > instead of their native tongue because these were more established in the > writing tradition. Although there are phonological or morphological > complements that can reveal the language, these are not systematically > used, so that we have uncertainties about the language of individual words > or even entire texts. However, if these texts form the basis for a glossary > or dictionary, these uncertainties percolate to the glossary, especially if > it is corpus-based. The Electronic Penn Sumerian dictionary thus does not > distinguish Sumerian and Akkadian forms and just groups everything under > the same head word and just provides Sumerian and Akkadian readings of the > same sign. (The selection of texts is such that a Sumerian reading is more > likely, but it is not always necessary.) In some cases in this dictionary, > it is even marked that there are doubts that a word is Sumerian in the > first place (http://oracc.museum.upenn.edu/epsd2/cbd/sux/o0023151.html). > > Such data does not allow to create distinct lexical entries for both (or, > in case of Hittite texts, three) languages that would just go under the > same lexicog:Entry, because we cannot decide which information (other than > the possible Sumerian and Akkadian interpretations of the same Cuneiform > writtenRep) belongs to which lexical entry. > > For this reason, we are currently considering to have language-agnostic > lexical entries for a future CDLI glossary (https://cdli.ucla.edu/), > where language information is provided only at the form (or even, within > the writtenRep), but not at the lexical entry. Note that there is no > constraint in the OntoLex core model that requires a single language per > lexical entry. > > What OntoLex says about language is not in the core model, but in Lime: > "note that all entries in the same lexicon should be in the same language > and that the language of the lexicon and entry should be consistent with > the language tags used on all forms". This a comment (in parenthesis, in > accompanying text, and if assumed to be relevant for the definition of > ontolex:LexicalEntry, in the wrong place), formulated as a recommendation > and not part of any definition. > > If we consider this statement to be nevertheless binding, the CDLI > solution would be to create a dictionary with senses and lexicog:Entrys, > but without ontolex:Entrys. I would prefer not to. (I would still prefer to > avoid multilingual lexical entries in cases in which language-specific > information is provided, and thus to keep the recommendation in place, as > is, but this is not the case here.) > > Best, > Christian >
Received on Wednesday, 8 December 2021 13:53:29 UTC