- From: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>
- Date: Tue, 06 Nov 2018 19:16:16 +0100
- To: "John P. McCrae" <john.mccrae@insight-centre.org>
- Cc: "public-ontolex@w3.org" <public-ontolex@w3.org>, "Hinkelmanns Peter" <peter.hinkelmanns@sbg.ac.at>
- Message-ID: <op.zr2k1evv89jat0@kitaba.rz.uni-frankfurt.de>
Am .11.2018, 16:36 Uhr, schrieb John P. McCrae <john.mccrae@insight-centre.org>: > Yes, to summarize other authors, it is not expected that a sense should > have multiple references, unless they are semantically equivalent (e.g., > >skos:exactMatch). This seems relatively straightforward if you think > about... if you make a distinction in your SKOS thesaurus, why wouldn't > the same >distinction be necessary in the lexicon? It is straight-forward, indeed, but only if either (1) you model thesaurus and lexicon from scratch and as a single resource (which seems to be the case here), (2) you start with an existing ontology and want to build a dictionary for it, or (3) you start with a dictionary and want to build an ontology for it. In one case, it is not: (4) you want to combine an existing dictionary and an existing ontology with each other. There is another reason: It is possible that a thesaurus provides high-level distinctions only. Think of Dorfseiff groups for German "Wasser" (water, http://corpora.uni-leipzig.de/de/res?corpusId=deu_newscrawl_2011&word=Wasser): 7.8 transparent 7.61 liquid 13.21 anorganic chemistry 16.8 drinks, non-alcoholic And so they occur in the Wikipedia definition: "Wasser (H2O) ist eine chemische Verbindung (=> 13.21) aus den Elementen Sauerstoff (O) und Wasserstoff (H). Wasser ist als Flüssigkeit (=> 7.61) durchsichtig (=> 7.61)". Obviously, we can create a concept "transparent liquid; anorganic; suitable for drinking", but this is not provided by Dornseiff & Quasthoff (2004), and if we (being neither the creators of whatever dictionary we start with nor the thesaurus) create it, it needs to have a different ontological status that the rest of the thesaurus -- because it differs in provenance. The underlying problem of this particular thesaurus is that it is focusing on feature decomposition rather than on providing a concept inventory. This is not untypical for thesauri. My feeling in the Dornseiff case is actually that we should not use ontolex:reference, but ontolex:denotes. This is somewhat vague in its definition, but it also corresponds to the rather abstract nature of the thesaurus concepts. If the MHDBDB categories are rather abstract (I remember some are), this would be an alternative, there, too. No cardinality restrictions apply to ontolex:denotes. > Similarly, different parts-of-speech necessarily have different meanings, We have counterexamples to this, often among function words: Many English prepositions are also complementizers (subordinating conjunctions), verbal particles, and sometimes adverbs -- these do not necessarily differ in meaning, but only in syntax (i.e., the element they modify). Pronouns and determiners are another typical case, hence some tagsets just lump them together -- lexinfo doesn't. In open categories, such phenomena do occur, as well, but they are usually treated as "zero morphology". German adjectives can be systematically used as adverbs (again, these differ only by the element they modify). A third case is in lexicography of historical language varieties, where parts of speech may have changed during the time the dictionary covers, and we might want to formulate different interpretations. Old High German prepositions could be used as adverbs, and many prepositional adverbs were grammaticalized into verbal particles over time, but there is a long transition period where a preposition can be equally regarded (by a modern lexicographer) as verbal particle or adverb -- in the same syntactic context, two parts of speech apply (particle or adverb), resp., they are indistinguishable, and they could be recorded as such in a historical dictionary of German. Data sparsity is yet another source of multiple parts of speech: For example, a word may be attested, but not with an unambiguous context. Hypothetical parts of speech are typical for low-resource languages (see https://archive.org/details/lalanguegauloise00dottuoft/page/262, "ieuru ... datif singulier ou forme verbale", and later "iorebe ... verbe ou datif pluriel"). We don't want to create one LexicalEntry per *hypothetical* part-of-speech, do we? Best, Christian -- Prof. Dr. Christian Chiarcos Applied Computational Linguistics Johann Wolfgang Goethe Universität Frankfurt a. M. 60054 Frankfurt am Main, Germany office: Robert-Mayer-Str. 10, #401b mail: chiarcos@informatik.uni-frankfurt.de web: http://acoli.cs.uni-frankfurt.de tel: +49-(0)69-798-22463 fax: +49-(0)69-798-28931
Received on Tuesday, 6 November 2018 18:16:45 UTC