Re: Something on grammar from Christian Chiarcos on 2021-01-19 (public-cogai@w3.org from January 2021)

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Tue, 19 Jan 2021 18:28:54 +0100
To: "Dave Raggett" <dsr@w3.org>
Cc: public-cogai <public-cogai@w3.org>
Message-ID: <op.0xg9ikncbr5td5@kitaba>
Hi Dave,

Am .01.2021, 12:39 Uhr, schrieb Dave Raggett <dsr@w3.org>:

> ... humans don’t actually use logic when reasoning, and instead think in  
> terms of mental models of examples, along with the use of metaphors and  
> >analogies. See the work by Philipp Johnson-Laird, e.g. “How We Reason”,  
> Philip Johnson-Laird, 2012, Oxford University Press, 
> https://doi.org/10.1093/acprof:oso/9780199551330.001.0001
>
>>> DRT / SDRT  
>>> (https://plato.stanford.edu/entries/discourse-representation-theory/).
>
> ... can be represented in chunks as a graph with symbols that stand for  
> hypothetical instances of some class, along with quantifiers such as: at  
> least one, >some, most, all and none
> These can be interpreted by rulesets and graph algorithms that model  
> different kinds of human reasoning..

In fact, (S)DRT is the formal model that probably comes closest to  
Johnson-Laird's mental models. On the one hand, discourse representation  
segments formalize context, on the other hand, DRT has actually been  
designed to deal with quantifier scope. Eijck and Kamp (2010) claim that  
"A theory of representation of discourse in context holds a particular  
promise for the treatment of belief because the representation structures  
themselves could be viewed as a kind of mental representation language;  
thus a belief relation could typically be modelled as a relation between a  
subject and a representation structure (Asher [3])." There are parallels  
with models from psycholinguistics, especially if it comes to simplified  
representations of context (their appendix A) which basically aims to  
unify their formal model with psycholinguistic/cognitive linguistic  
theories of salience and reference (think of  Chafe 1994, as an example).

Eijck, J. v. and H. Kamp (2010). Discourse representation in context. In  
J. v. Benthem and A. ter Meulen (Eds.), Handbook of Logic and Language,  
pp. 181 – 252. Elsevier,  
https://staff.fnwi.uva.nl/d.j.n.vaneijck2/papers/10/pdfs/dric.pdf.

Chafe, W. (1994). Discourse, consciousness, and time: The flow and  
displacement of conscious experience in speaking and writing. University  
of Chicago Press.

In any case, real-world DRT parsing doesn't actually seem to use these  
representations, but rather, these are derived from more shallow  
techniques of parsing. (This is how I understand Boxer  
[https://gmb.let.rug.nl/software.php] to work, and this is precisely why  
the resulting parses do not actually have the context-awareness of  
"proper" DRT parses even though they use DRT as representation formalism.)  
At least for SDRT, I presume that this is because it does not seem to be  
feasible to construct or to bootstrap the necessary lexical data (Asher  
and Lascarides' "glue logic", the rule sets, if you will) at scale.

Nicholas Asher & Alex Lascarides (2011), Reasoning dynamically about what  
one says. Synthese volume 183, pages5–31,  
https://link.springer.com/article/10.1007/s11229-011-0016-4
Regardless of the underlying formalisms, the essential question to any  
symbolic or rule-based approach on natural language understanding is how  
to address that knowledge gap or how to avoid the problem. As far as I can  
see, there are basically four possible strategies:

(a) develop a methodology for bootstrapping such information from  
unannotated data,
(b) repurpose existing resources, esp., lexical data, rule sets or  
complete tools,
(c) restrict yourself to a closed domain, or
(d) integrate approaches on representation learning or self-supervision

Modern NLP primarily goes into the latter direction. Hence the dominance  
of neural methods nowadays. Traditional, symbolic NLP was very much  
focusing on (c), but that largely withered away during the 1990s. During  
the era of statistic NLP, bootstrapping (a) was a popular strategy, but  
it's hard to arrive at high-quality data/rules/dictionary in this way.  
Most practical applications of NLP technology do (b), i.e., not to bother  
with building NLP tools in the first place, but just to customize them or  
their output.

NB: As for UCCA and AMR, I'm not actually suggesting to use them as is,  
but maybe some of the data or tools developed for their parsing can be  
repurposed.

> My demo on smart homes includes an example of default reasoning relating  
> to the lighting and heating of a room    taking into account the  
> preferences of >who is in that room, see:  
> https://www.w3.org/Data/demos/chunks/home/
>
> I look forward to other demos that implement the kind of reasoning  
> described by Johnson-Laird. These will be easier to implement if we have  
> a working >implementation of natural language for end to end  
> communication of meaning. Such demos could build upon a limited subset  
> of language, along with >manually developed declarative and procedural  
> knowledge. In other words, we don’t need to solve all of language to  
> build useful demos.

Ok, so the overall goal would be to develop closed-domain solutions? That  
should be possible.

>
>> Complexity of symbolic parsing. Notoriously slow when it comes to  
>> larger dictionaries
>
> Can you please expand on that as it isn’t obvious to me. Perhaps this is  
> something to do with the kind of parsers they’ve used?

Kind of. There are linear-time (O(n)) parsers, but achieving linear time  
means to limit context awareness and to ignore (or postpone) the  
resolution of ambiguities. Shift-reduce parsers with backtracking can be  
exponential. Between that, everything is possible, but realistically, the  
more expressive grammar formalisms are mildly context-sensitive and can be  
parsed in polynomial time (e.g.,  Öttl et al. 2015). So, we're talking  
about something between O(n^2) or O(n^3) and O(n^6) time complexity. Here,  
n is the length of the sentence, but in practice, the effect of the size  
of the ruleset (grammar+lexicon)  has an immense impact, too (think of it  
as a constant multiplied with the base n). Large coverage requires large  
rulesets, so, this will work nicely for closed-domain applications but  
beyond that, you have a tradeoff between scalability and coverage -- or,  
if you take a more restricted parsing formalism, context-awareness.

Birgit Öttl, Gerhard Jäger, Barbara Kaup (2015), Does Formal Complexity  
Reflect Cognitive Complexity? Investigating Aspects of the Chomsky  
Hierarchy in an Artificial Language Learning Study, PLOS ONE,  
https://doi.org/10.1371/journal.pone.0123059

Neural parsing circumvents the issue by reducing parsing to computations  
over a *finite* set of embeddings at every point in time (no lexicon!) --  
but only to the extend that context information is faithfully maintained  
in these embeddings. Theoretically, that can be achieved with RNNs (for  
example), but in reality, the degree to which they preserve information  
about earlier states is limited by the numerical precision (that would be  
floats or half-floats). It is not possible to quantify the effect exactly  
and it depends on the architecture of the network and the characteristics  
of the data. And you end up with a blackbox, of course, not rules.

> For human processing, you can measure the time someone takes to read an  
> utterance (e.g. with eye tracking), and see how the time changes with  
> different >kinds of utterance. I don’t have any pointers to such work to  
> hand, but expect that it would show effects on the level of embedding  
> and the complexity of >references within the utterance. Made up words  
> can be used to explore the reasoning involved in dealing with previously  
> unknown words.
>
>> Coverage of symbolic parsing. The best HPSG grammars for English cover  
>> maybe 85% of the input tokens
>
> Perhaps the grammars are too prescriptive?

I think it's mostly out-of-vocabulary words. A(lmost a) non-issue in  
closed domains, but basically prohibitive for open-domain applications. I  
don't think there are any commercial applications around anymore based on  
symbolic parsing.

Overall, the history of symbolic parsing points to the existence two major  
challenges:
- How to acquire the necessary knowledge (rules, etc.) ?
- How to make processing performant?

Regardless as to whether CogAI solutions are based on these earlier lines  
of research or be developed from scratch, these are the challenges to be  
expected. For closed domains and demos, all can work nicely. For anything   
beyond that, we need to think how to address that. The approach of NLP has  
been to largely abandon symbolic parsing, but this development is less  
driven by scientific insight than by the prospect to trade human expertise  
against computing power (remember "Every time I fire a linguist ...").

In any case, I'm not sure I fully understand the chunk mechanism, so it's  
likely I miss something obvious.

> Cognitive parsers should be able to make some sense of incomplete or  
> ungrammatical utterances. This also relates to the potential for  
> learning new >grammar.

That would be ideal. How would that work in practice?

Best,
Christian
Received on Tuesday, 19 January 2021 17:29:59 UTC