Re: multilingual lexical entries? from Jorge Gracia del Río on 2022-01-05 (public-ontolex@w3.org from January 2022)

From: Jorge Gracia del Río <jogracia@unizar.es>
Date: Wed, 5 Jan 2022 10:38:39 +0100
To: Christian Chiarcos <christian.chiarcos@gmail.com>
Cc: public-ontolex <public-ontolex@w3.org>
Message-ID: <CAMe8T+s8cFH2Gpi8CeStOofTKY4nwmhTpMsPr+g3Yj7Coqsn8w@mail.gmail.com>
Dear Christian,

What about this other approximation? That is, creating a
"language-agnostic" lexicog:entry per known record in the dictionary, and
then instantiate lexical entries to account for the language specific
information:

:sze_concept a ontolex:LexicalConcept;
     skos:definition "unit of weight, approx 0.04 g" .

:sze_sux a ontolex:LexicalEntry;
    ontolex:canonicalForm [
        ontolex:writtenRep "𒊺"@sux-Xsux;
        ontolex:writtenRep "sze"@sux-Latn
    ] .

:sze_akk a ontolex:LexicalEntry;
    ontolex:canonicalForm [
       ontolex:writtenRep "𒊺"@akk-Xsux;
       ontolex:writtenRep "uţţatu"@akk-Latn
    ] .

: sze_concept  ontolex:isEvokedBy :sze_sux:,  sze_akk  .

:sze_entry a lexicog:Entry ;
     lexicog:describes sze_sux, :sze_akk .


Best regards,

Jorge

El mié, 8 dic 2021 a las 14:54, Christian Chiarcos (<
christian.chiarcos@gmail.com>) escribió:

> Dear all,
>
> just for clarification, the following is what I would like to do:
>
> :sze_le a ontolex:LexicalEntry;
> ontolex:canonicalForm [
> ontolex:writtenRep "𒊺"; # or: ontolex:writtenRep "𒊺"@sux-Xsux, ontolex:writtenRep
> "𒊺"@akk-Xsux
> ontolex:writtenRep "sze"; # transliteration
> ontolex:writtenRep "sze"@sux-Latn; # transcription
> ontolex:writtenRep "uţţatu"@akk-Latn # transcription
> ]; ontolex:sense [ rdfs:comment "unit of weight, approx 0.04 g" ].
>
> The alternative with lexicog:Entry (and without duplicating
> LexicalEntries) would be
>
> :sze_le a lexicog:Entry;
> lexicog:describes [ a ontolex:Form;
> ontolex:writtenRep "𒊺";
> ontolex:writtenRep "sze"; # transliteration
> ontolex:writtenRep "sze"@sux; # transcription
> ontolex:writtenRep "uţţatu"@akk # transcription ... IMHO different
> language tags should be unproblematic for forms
> ]; lexicog:describes [ a ontolex:LexicalSense; rdfs:comment "unit of
> weight, approx 0.04 g"].
>
> The latter way of modelling should be in line with the documentation, but
> it makes large parts of OntoLex-Lemon redundant and others (e.g.,
> canonicalForm) inapplicable, I would prefer to avoid that.
>
> Best,
> Christian
>
> Am Di., 7. Dez. 2021 um 16:32 Uhr schrieb Christian Chiarcos <
> christian.chiarcos@gmail.com>:
>
>> Dear all,
>>
>> for different use cases, I came across the need to provide one lexical
>> entry for multiple languages.
>>
>> In one group of cases (esp., etymological dictionaries), this can be
>> circumvented by using lexicog:Entry, instead, and then point to
>> language-specific lexical entries. (Though this is very inelegant,
>> unnecessarily verbose and clearly a departure from/obfuscation of the
>> original structure of the lexical resource, but technically, it is a
>> possibility.)
>>
>> However, in another case (dictionaries/glossaries for cuneiform
>> languages), we have the problem that we cannot always tell what language a
>> text (and thus, a word) is in. This is because of the multilingual
>> situation of Sumerian and Akkadian during the 3rd m. BC, because of the use
>> of ideographic signs, because of the laziness of scribes to often not write
>> morphemes, but just the stem of a word, and because of the habit of
>> Akkadian and Hittite scibes to just write Sumerian (or Akkadian) words
>> instead of their native tongue because these were more established in the
>> writing tradition. Although there are phonological or morphological
>> complements that can reveal the language, these are not systematically
>> used, so that we have uncertainties about the language of individual words
>> or even entire texts. However, if these texts form the basis for a glossary
>> or dictionary, these uncertainties percolate to the glossary, especially if
>> it is corpus-based. The Electronic Penn Sumerian dictionary thus does not
>> distinguish Sumerian and Akkadian forms and just groups everything under
>> the same head word and just provides Sumerian and Akkadian readings of the
>> same sign. (The selection of texts is such that a Sumerian reading is more
>> likely, but it is not always necessary.) In some cases in this dictionary,
>> it is even marked that there are doubts that a word is Sumerian in the
>> first place (http://oracc.museum.upenn.edu/epsd2/cbd/sux/o0023151.html).
>>
>> Such data does not allow to create distinct lexical entries for both (or,
>> in case of Hittite texts, three) languages that would just go under the
>> same lexicog:Entry, because we cannot decide which information (other than
>> the possible Sumerian and Akkadian interpretations of the same Cuneiform
>> writtenRep) belongs to which lexical entry.
>>
>> For this reason, we are currently considering to have language-agnostic
>> lexical entries for a future CDLI glossary (https://cdli.ucla.edu/),
>> where language information is provided only at the form (or even, within
>> the writtenRep), but not at the lexical entry. Note that there is no
>> constraint in the OntoLex core model that requires a single language per
>> lexical entry.
>>
>> What OntoLex says about language is not in the core model, but in Lime:
>> "note that all entries in the same lexicon should be in the same language
>> and that the language of the lexicon and entry should be consistent with
>> the language tags used on all forms". This a comment (in parenthesis, in
>> accompanying text, and if assumed to be relevant for the definition of
>> ontolex:LexicalEntry, in the wrong place), formulated as a recommendation
>> and not part of any definition.
>>
>> If we consider this statement to be nevertheless binding, the CDLI
>> solution would be to create a dictionary with senses and lexicog:Entrys,
>> but without ontolex:Entrys. I would prefer not to. (I would still prefer to
>> avoid multilingual lexical entries in cases in which language-specific
>> information is provided, and thus to keep the recommendation in place, as
>> is, but this is not the case here.)
>>
>> Best,
>> Christian
>>
>
Received on Wednesday, 5 January 2022 09:40:05 UTC