- From: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>
- Date: Wed, 20 Mar 2019 20:05:42 +0100
- To: "Julia Bosque Gil" <jbosque@fi.upm.es>
- Cc: peikert.katrin@web.de, public-ontolex <public-ontolex@w3.org>
- Message-ID: <op.zyysnsvp89jat0@kitaba>
Dear Julia, > Thank you for your comments, this is leading to a nice discussion indeed > :) ;) > So, without knowing to which entry each form belongs, I only see three > options (now revisited after your e-mail): >> (a) All lexical entries receive the forms, triplicating the list of >> forms ("gal" in Sumerian has 198 attested forms without information >> about the sense, so this >option should probably be reconsidered...!) > (b) The entries share the forms by interpreting "a" in the definition of > ontolex:Form as existential quantifier. What worries me here is that I > am not sure >about a "realisation" being a realisation of more than one > entry at the same time. I understand your hesitations here, and with a technical perspective, I'm possibly too pragmatic here ;) Any input from a lexicographer? >> (c) Only one lexical entry is linked to the forms. For the other >> lexical entries...either they inherit from the first entry with your >> new property, or we would >need to access the forms through lexicog >> mechanisms. Regarding this new property that you suggest, it makes a >> lot of sense to me if you know >beforehand that there is an "original" >> LexicalEntry (or one you want to treat as "original") which does occur >> in a series of forms, and the other lexical >entries are realized with >> the same grammatical properties. I think that would be the case here, because the EPSD is providing the "dictionary-entry"-level POS tag in a more prominent fashion than the sense-specific POS tags. In dictionaries, a similar interpretation can be given to the sequential order of parts of speech. I.e., if a lexicographer puts one particular part of speech first, he either does so because it is more "prototypical" or "natural" for a potential reader, because he knows (and assumes his reader to know) about the origin of a "zero-derived" form, or because he follows a general pattern that would probably implement the intuition that verbs and nouns are somewhat more "fundamental" than adjectives or adverbs, or even function words. In either way, there would be a first, i.e., most prominently represented one, and we can just *define* it as the "original". However, this may be something different than the direction of morphological derivation (which, diachronically, may have been a morphological process, like the derivation of German adverbs from adjectives [Middle High German added an -e here, which was then lost by apocope]), and this is why I'm struggling a bit with the name of the property. > It would be a nice solution to the problem of the adjective-adverb issue > in German we discussed in some calls on the lexicog module, as you > mentioned. >But, for the example of the Sumerian data, I might have > missed something or got lost in the process: how do we know which forms > are linked to the >"original" LexicalEntry on the first place, if there > is no way to know from the data which of the 198 forms of "gal" are > connected to which lexical entry (v, >n, or adj)? In other words, are > we preventing a wrong scope of form attestations in any of the ways of > implementing option (c)? We can, at least: If we state that a "derived" LexicalEntry does inherit (resp., they extend) the forms and senses *unless explicitly given*, explicitly giving form and/or sense information entails that they are *not* inherited (extended) to a particular derived form. This is a bit like overriding an inherited variable in an object-oriented programming language. However, this subtle difference can only be maintained if we adopt the closed world assumption for OntoLex data. Because otherwise, we might incidentially just have lost the decisive ontolex:lexicalForm property that would have helped us to decide about the scope. This would be a *BIG* design decision to make (but one I would tentatively support -- I am probably missing some of its consequences, though). This problem is deeply rooted in RDF semantics (and one of the aspects where it differs from, say, the default semantics of graph data bases). If we adopt the Open World Assumption, we *cannot* express (in any of these modelling choices) that something is *un*ambiguous -- because it could always be that we just lost the triple that connects the form with another lexical entry. This issue persists for the other modelling choices, as well. I just created an issue on https://github.com/w3c/EasierRDF requesting a compact means to assert CWA or OWA interpretations to an RDF graph. Best, Christian -- Prof. Dr. Christian Chiarcos Applied Computational Linguistics Johann Wolfgang Goethe Universität Frankfurt a. M. 60054 Frankfurt am Main, Germany office: Robert-Mayer-Str. 11-15, #107 mail: chiarcos@informatik.uni-frankfurt.de web: http://acoli.cs.uni-frankfurt.de tel: +49-(0)69-798-22463 fax: +49-(0)69-798-28334
Received on Wednesday, 20 March 2019 19:08:21 UTC