Re: A possible topic for OntoLex-Lemon and modules from Christian Chiarcos on 2020-06-24 (public-ontolex@w3.org from June 2020)

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Wed, 24 Jun 2020 19:32:09 +0200
To: public-ontolex <public-ontolex@w3.org>, "Thierry Declerck" <declerck@dfki.de>
Cc: "Bajcetic, Lenka" <Lenka.Bajcetic@oeaw.ac.at>, "chiarcos@informatik.uni-frankfurt.de" <chiarcos@informatik.uni-frankfurt.de>
Message-ID: <op.0mp8bvwnbr5td5@kitaba>
Hi Thierry, dear Lenka, dear all,

actually, the semantics of function words are an extremely complex area --  
at least in the second you go beyond Germanic and Romance, and there is a  
lot of related work in discourse studies, semantics, psycholinguistics,  
and (pre-neural) computational linguistics that amounted for decades.

Computational approaches on function word semantics include a hierarchy of  
preposition supersenses (see https://github.com/nert-nlp/streusle --  
prepositions only), efforts to align (and decompose) various theories of  
discourse relations (http://www.textlink.ii.metu.edu.tr/ -- this is for  
adverbs and conjunctions), etc. Trouble is that the complexity of these  
approaches exceeds even the complexity of resources such as FrameNet  
(because they build on that), and that the cross-linguistic dimensions  
have been barely tackled so far.

As for the (formal) semantics of determiners and pronouns, these are a  
*very* complicated matter as a unified theory of reference and deixis has  
not even emerged at a theoretical level (with promising approaches in  
Levelt 1989, Ariel 1991, Gundel et al. 1993, Grosz et al. 1995 -- and a  
drop of interest from the computational/semantic side since the  
mid-2000s). In the 1990s, von Heusinger developed an appealing approach by  
formalising definiteness in terms of a selection function that operates  
over an underlying salience metric that basically provides a contextually  
determined ranking of possible antecedent candidates. The point here is,  
however, that this ranking is contextually determined, so it is very hard  
to formalize it in a context-independent fashion (we would need to agree  
on *one* metric and a threshold, or another selection criterion). The  
semantics of determiners are, however, not limited to definiteness, but  
they are also sensitive to distance (e.g., in Macedonian, if I'm not  
mistaken), specificity (rather than definiteness, this is what Farsi  
determiners mark). And of course you'll find all kinds of  
grammaticalizations where the underlying semantic or disourse function is  
barely recognizable (think of Slavic full and reduced adjective  
inflections, e.g., horosho vs. horoshoe in Russian, these originate from a  
clitic determiner as still preserved in Lithuanian, but this is not their  
function anymore). And then there are languages that just mark different  
things, e.g., topic and focus markers in African or Asian languages,  
classifiers, reference tracking in languages without "proper" pronouns. In  
essence, the enterprise to formalize the meaning (function words are  
semantically bleached, so this is not actually lexical semantics) would  
amount to the development of a "universal" computational theory of grammar.

I would very much welcome an open discussion and participate in that -- in  
parts at least -- and I am working on this (this brought me into linked  
data because linguistic research / language technology on that complex  
level requires a level of language resource interoperability that no other  
technology seems to capable to establish). In fact, the OLiA Discourse  
Extensions  
(http://www.acoli.informatik.uni-frankfurt.de/resources/discourse/) have  
been a proposal to develop something in this direction, but they still  
remain too shallow and focused solely on existing annotation schemes  
rather than on formalizing the underlying functions.

On the practical side, I am, however, very much convinced that this  
exceeds *WAAY* beyond the scope of OntoLex (as the phenomena involved  
exceed the scope of lexical semantics, but include grammar and context). I  
think the most likely place to discuss this would be in the context of the  
SemAF specifications, but this is ISO and not an open discussion. The ACL  
SIGs SIGDial and SIGSEM would be less restricted. There are workshops and  
shared tasks, SemEval, StarSem, etc. These could be a point to start an  
open discussion, maybe with a position paper [actually, I would like to  
work on something like that ...]. The trouble is that there are some  
fundamental issues that still need to be solved (and are being worked on  
by several communities) before the modelling aspect can even be thought  
about.

Best,
Christian

Am .06.2020, 14:51 Uhr, schrieb Thierry Declerck <declerck@dfki.de>:

> Dear All,
>
> Lenka (in cc, in case she is not yet inlcuded in the mailing list) and  
> myself had recently a discussion on the topics of entries of the  
> so-called type "closed classes", meaning with this determiners,  
> prepositions, auxiliary verbs, pronouns, ...
>
> We were wondering if there were already discussions on how to encode  
> those in OntoLex-Lemon. The question is mainly on the "semantic by  
> reference". It *seems* straightforward to encode this way in OntoLex   
> the semantics of nouns, but what about determiners and pronouns (and the  
> like)? Are there some pointers to this thema, and do you think it would  
> be worth to open a discussion on this, if there are not already  
> solutions (I am not aware of. In this case, sorry for that).
>
> Thanks
>
> Thierry
Received on Wednesday, 24 June 2020 17:32:23 UTC