Re: MHDBDB: Connecting Senses with multiple Concepts from John P. McCrae on 2018-11-07 (public-ontolex@w3.org from November 2018)

From: John P. McCrae <john.mccrae@insight-centre.org>
Date: Wed, 7 Nov 2018 08:19:38 +0000
To: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>
Cc: "public-ontolex@w3.org" <public-ontolex@w3.org>, Hinkelmanns Peter <peter.hinkelmanns@sbg.ac.at>
Message-ID: <CAHLDFnpU+g3rTTVx5xsX0UrYUUya33Ew8bVGT3OKOFr=4tVBOw@mail.gmail.com>
Hi Christian,

Yes, in fact the OntoLex core was never intended to capture the content of
an existing dictionary as is and in fact many concepts in OntoLex such as
lexical sense do not in fact correspond to similarly named concepts in
dictionaries. Again the goal of the lexicography module is to close this
gap so that we can account for entries that do not follow the restrictions
of the OntoLex core, e.g., a single part-of-speech tag, a single reference
to an ontology etc.

Regards,
John

On 6 November 2018 at 18:16, Christian Chiarcos <
chiarcos@informatik.uni-frankfurt.de> wrote:

> Am .11.2018, 16:36 Uhr, schrieb John P. McCrae <
> john.mccrae@insight-centre.org>:
>
> Yes, to summarize other authors, it is not expected that a sense should
> have multiple references, unless they are semantically equivalent (e.g.,
> skos:exactMatch). This seems relatively straightforward if you think
> about... if you make a distinction in your SKOS thesaurus, why wouldn't the
> same distinction be necessary in the lexicon?
>
>
> It is straight-forward, indeed, but only if either
> (1) you model thesaurus and lexicon from scratch and as a single resource
> (which seems to be the case here),
> (2) you start with an existing ontology and want to build a dictionary for
> it, or
> (3) you start with a dictionary and want to build an ontology for it.
>
> In one case, it is not:
> (4) you want to combine an existing dictionary and an existing ontology
> with each other.
>
> There is another reason: It is possible that a thesaurus provides
> high-level distinctions only. Think of Dorfseiff groups for German "Wasser"
> (water, http://corpora.uni-leipzig.de/de/res?corpusId=deu_newscrawl_
> 2011&word=Wasser):
>
> 7.8 transparent
> 7.61 liquid
> 13.21 anorganic chemistry
> 16.8 drinks, non-alcoholic
>
> And so they occur in the Wikipedia definition:
> "Wasser (H2O) ist eine chemische Verbindung (=> 13.21) aus den Elementen
> Sauerstoff (O) und Wasserstoff (H). Wasser ist als Flüssigkeit (=> 7.61)
> durchsichtig (=> 7.61)".
>
> Obviously, we can create a concept "transparent liquid; anorganic;
> suitable for drinking", but this is not provided by Dornseiff & Quasthoff
> (2004), and if we (being neither the creators of whatever dictionary we
> start with nor the thesaurus) create it, it needs to have a different
> ontological status that the rest of the thesaurus -- because it differs in
> provenance.
>
> The underlying problem of this particular thesaurus is that it is focusing
> on feature decomposition rather than on providing a concept inventory. This
> is not untypical for thesauri. My feeling in the Dornseiff case is actually
> that we should not use ontolex:reference, but ontolex:denotes. This is
> somewhat vague in its definition, but it also corresponds to the rather
> abstract nature of the thesaurus concepts. If the MHDBDB categories are
> rather abstract (I remember some are), this would be an alternative, there,
> too. No cardinality restrictions apply to ontolex:denotes.
>
> Similarly, different parts-of-speech necessarily have different meanings,
>
>
> We have counterexamples to this, often among function words: Many English
> prepositions are also complementizers (subordinating conjunctions), verbal
> particles, and sometimes adverbs -- these do not necessarily differ in
> meaning, but only in syntax (i.e., the element they modify). Pronouns and
> determiners are another typical case, hence some tagsets just lump them
> together -- lexinfo doesn't.
>
> In open categories, such phenomena do occur, as well, but they are usually
> treated as "zero morphology". German adjectives can be systematically used
> as adverbs (again, these differ only by the element they modify).
>
> A third case is in lexicography of historical language varieties, where
> parts of speech may have changed during the time the dictionary covers, and
> we might want to formulate different interpretations. Old High German
> prepositions could be used as adverbs, and many prepositional adverbs were
> grammaticalized into verbal particles over time, but there is a long
> transition period where a preposition can be equally regarded (by a modern
> lexicographer) as verbal particle or adverb -- in the same syntactic
> context, two parts of speech apply (particle or adverb), resp., they are
> indistinguishable, and they could be recorded as such in a historical
> dictionary of German.
>
> Data sparsity is yet another source of multiple parts of speech: For
> example, a word may be attested, but not with an unambiguous
> context. Hypothetical parts of speech are typical for low-resource
> languages (see https://archive.org/details/lalanguegauloise00dottuoft/
> page/262, "ieuru ... datif singulier ou forme verbale", and later "iorebe
> ... verbe ou datif pluriel"). We don't want to create one LexicalEntry per
> *hypothetical* part-of-speech, do we?
>
> Best,
> Christian
> --
> Prof. Dr. Christian Chiarcos
> Applied Computational Linguistics
> Johann Wolfgang Goethe Universität Frankfurt a. M.
> 60054 Frankfurt am Main, Germany
>
> office: Robert-Mayer-Str. 10, #401b
> <https://maps.google.com/?q=Robert-Mayer-Str.+10,+%23401b&entry=gmail&source=g>
> mail: chiarcos@informatik.uni-frankfurt.de
> web: http://acoli.cs.uni-frankfurt.de
> tel: +49-(0)69-798-22463
> fax: +49-(0)69-798-28931
>
Received on Wednesday, 7 November 2018 08:20:03 UTC