Re: MHDBDB: Connecting Senses with multiple Concepts from Sander Stolk on 2018-11-07 (public-ontolex@w3.org from November 2018)

From: Sander Stolk <ssstolk@gmail.com>
Date: Wed, 7 Nov 2018 10:20:49 +0100
To: John McCrae <john.mccrae@insight-centre.org>
Cc: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>, public-ontolex <public-ontolex@w3.org>, peter.hinkelmanns@sbg.ac.at
Message-ID: <CAJurLCyvxrSBRMhntmdbVZmyqqFdA5k=b=cSiCE0jHS+2rKA4g@mail.gmail.com>
Yes indeed, the lexicography module may aid to some extent. Onomasiological
orderings (as also found in thesauri) have not been the focus of this
module, however. In fact, that is why lemon-tree was formed next to the
lexicography module.

Lemon-tree offers a small set of additional terminology for onomasiological
orderings. Next to ontolex:isLexicalizedSenseOf, which requires the related
concept to completely define the sense, lemon-tree offers the relation
:isSenseInConcept. The use of this lastmentioned relation allows relating
senses to multiple concepts, to categorize them at such a concept even if
they may not fully lexicalize it. This would help the case of Peter. (In
fact, it would require very little changes to his current work.)

The lemon-tree documentation should be able to provide a good overview of
this approach, and why it is helpful in the current case too. Moreover, it
includes some examples from existing thesauri to see it in action [incl.
Shakespeare Thesaurus, Scots Thesaurus, Historical Thesaurus of English].
See http://w3id.org/lemon-tree .

If there are any questions about the subject, I'm sure we can be of further
assistance.

Kind regards,
Sander

P.S.
An additional benefit is that there is already some tooling in the works
that supports viewing data according to this kind of modelling. I've been
working on the software Evoke, for which a demonstration (incl. video) can
be found at http://evoke.ullet.net


On Wed, 7 Nov 2018 at 09:21, John P. McCrae <john.mccrae@insight-centre.org>
wrote:

> Hi Christian,
>
> Yes, in fact the OntoLex core was never intended to capture the content of
> an existing dictionary as is and in fact many concepts in OntoLex such as
> lexical sense do not in fact correspond to similarly named concepts in
> dictionaries. Again the goal of the lexicography module is to close this
> gap so that we can account for entries that do not follow the restrictions
> of the OntoLex core, e.g., a single part-of-speech tag, a single reference
> to an ontology etc.
>
> Regards,
> John
>
> On 6 November 2018 at 18:16, Christian Chiarcos <
> chiarcos@informatik.uni-frankfurt.de> wrote:
>
>> Am .11.2018, 16:36 Uhr, schrieb John P. McCrae <
>> john.mccrae@insight-centre.org>:
>>
>> Yes, to summarize other authors, it is not expected that a sense should
>> have multiple references, unless they are semantically equivalent (e.g.,
>> skos:exactMatch). This seems relatively straightforward if you think
>> about... if you make a distinction in your SKOS thesaurus, why wouldn't the
>> same distinction be necessary in the lexicon?
>>
>>
>> It is straight-forward, indeed, but only if either
>> (1) you model thesaurus and lexicon from scratch and as a single resource
>> (which seems to be the case here),
>> (2) you start with an existing ontology and want to build a dictionary
>> for it, or
>> (3) you start with a dictionary and want to build an ontology for it.
>>
>> In one case, it is not:
>> (4) you want to combine an existing dictionary and an existing ontology
>> with each other.
>>
>> There is another reason: It is possible that a thesaurus provides
>> high-level distinctions only. Think of Dorfseiff groups for German "Wasser"
>> (water,
>> http://corpora.uni-leipzig.de/de/res?corpusId=deu_newscrawl_2011&word=Wasser
>> ):
>>
>> 7.8 transparent
>> 7.61 liquid
>> 13.21 anorganic chemistry
>> 16.8 drinks, non-alcoholic
>>
>> And so they occur in the Wikipedia definition:
>> "Wasser (H2O) ist eine chemische Verbindung (=> 13.21) aus den Elementen
>> Sauerstoff (O) und Wasserstoff (H). Wasser ist als Flüssigkeit (=> 7.61)
>> durchsichtig (=> 7.61)".
>>
>> Obviously, we can create a concept "transparent liquid; anorganic;
>> suitable for drinking", but this is not provided by Dornseiff & Quasthoff
>> (2004), and if we (being neither the creators of whatever dictionary we
>> start with nor the thesaurus) create it, it needs to have a different
>> ontological status that the rest of the thesaurus -- because it differs in
>> provenance.
>>
>> The underlying problem of this particular thesaurus is that it is
>> focusing on feature decomposition rather than on providing a concept
>> inventory. This is not untypical for thesauri. My feeling in the Dornseiff
>> case is actually that we should not use ontolex:reference, but
>> ontolex:denotes. This is somewhat vague in its definition, but it also
>> corresponds to the rather abstract nature of the thesaurus concepts. If the
>> MHDBDB categories are rather abstract (I remember some are), this would be
>> an alternative, there, too. No cardinality restrictions apply to
>> ontolex:denotes.
>>
>> Similarly, different parts-of-speech necessarily have different meanings,
>>
>>
>> We have counterexamples to this, often among function words: Many English
>> prepositions are also complementizers (subordinating conjunctions), verbal
>> particles, and sometimes adverbs -- these do not necessarily differ in
>> meaning, but only in syntax (i.e., the element they modify). Pronouns and
>> determiners are another typical case, hence some tagsets just lump them
>> together -- lexinfo doesn't.
>>
>> In open categories, such phenomena do occur, as well, but they are
>> usually treated as "zero morphology". German adjectives can be
>> systematically used as adverbs (again, these differ only by the element
>> they modify).
>>
>> A third case is in lexicography of historical language varieties, where
>> parts of speech may have changed during the time the dictionary covers, and
>> we might want to formulate different interpretations. Old High German
>> prepositions could be used as adverbs, and many prepositional adverbs were
>> grammaticalized into verbal particles over time, but there is a long
>> transition period where a preposition can be equally regarded (by a modern
>> lexicographer) as verbal particle or adverb -- in the same syntactic
>> context, two parts of speech apply (particle or adverb), resp., they are
>> indistinguishable, and they could be recorded as such in a historical
>> dictionary of German.
>>
>> Data sparsity is yet another source of multiple parts of speech: For
>> example, a word may be attested, but not with an unambiguous
>> context. Hypothetical parts of speech are typical for low-resource
>> languages (see
>> https://archive.org/details/lalanguegauloise00dottuoft/page/262, "ieuru
>> ... datif singulier ou forme verbale", and later "iorebe ... verbe ou datif
>> pluriel"). We don't want to create one LexicalEntry per *hypothetical*
>> part-of-speech, do we?
>>
>> Best,
>> Christian
>> --
>> Prof. Dr. Christian Chiarcos
>> Applied Computational Linguistics
>> Johann Wolfgang Goethe Universität Frankfurt a. M.
>> 60054 Frankfurt am Main, Germany
>>
>> office: Robert-Mayer-Str. 10, #401b
>> <https://maps.google.com/?q=Robert-Mayer-Str.+10,+%23401b&entry=gmail&source=g>
>> mail: chiarcos@informatik.uni-frankfurt.de
>> web: http://acoli.cs.uni-frankfurt.de
>> tel: +49-(0)69-798-22463
>> fax: +49-(0)69-798-28931
>>
>
>

-- 
Sander Stolk, MSc MA
Received on Wednesday, 7 November 2018 09:21:23 UTC