- From: Christian Chiarcos <christian.chiarcos@web.de>
- Date: Tue, 19 Jan 2021 18:28:54 +0100
- To: "Dave Raggett" <dsr@w3.org>
- Cc: public-cogai <public-cogai@w3.org>
- Message-ID: <op.0xg9ikncbr5td5@kitaba>
Hi Dave, Am .01.2021, 12:39 Uhr, schrieb Dave Raggett <dsr@w3.org>: > ... humans don’t actually use logic when reasoning, and instead think in > terms of mental models of examples, along with the use of metaphors and > >analogies. See the work by Philipp Johnson-Laird, e.g. “How We Reason”, > Philip Johnson-Laird, 2012, Oxford University Press, > https://doi.org/10.1093/acprof:oso/9780199551330.001.0001 > >>> DRT / SDRT >>> (https://plato.stanford.edu/entries/discourse-representation-theory/). > > ... can be represented in chunks as a graph with symbols that stand for > hypothetical instances of some class, along with quantifiers such as: at > least one, >some, most, all and none > These can be interpreted by rulesets and graph algorithms that model > different kinds of human reasoning.. In fact, (S)DRT is the formal model that probably comes closest to Johnson-Laird's mental models. On the one hand, discourse representation segments formalize context, on the other hand, DRT has actually been designed to deal with quantifier scope. Eijck and Kamp (2010) claim that "A theory of representation of discourse in context holds a particular promise for the treatment of belief because the representation structures themselves could be viewed as a kind of mental representation language; thus a belief relation could typically be modelled as a relation between a subject and a representation structure (Asher [3])." There are parallels with models from psycholinguistics, especially if it comes to simplified representations of context (their appendix A) which basically aims to unify their formal model with psycholinguistic/cognitive linguistic theories of salience and reference (think of Chafe 1994, as an example). Eijck, J. v. and H. Kamp (2010). Discourse representation in context. In J. v. Benthem and A. ter Meulen (Eds.), Handbook of Logic and Language, pp. 181 – 252. Elsevier, https://staff.fnwi.uva.nl/d.j.n.vaneijck2/papers/10/pdfs/dric.pdf. Chafe, W. (1994). Discourse, consciousness, and time: The flow and displacement of conscious experience in speaking and writing. University of Chicago Press. In any case, real-world DRT parsing doesn't actually seem to use these representations, but rather, these are derived from more shallow techniques of parsing. (This is how I understand Boxer [https://gmb.let.rug.nl/software.php] to work, and this is precisely why the resulting parses do not actually have the context-awareness of "proper" DRT parses even though they use DRT as representation formalism.) At least for SDRT, I presume that this is because it does not seem to be feasible to construct or to bootstrap the necessary lexical data (Asher and Lascarides' "glue logic", the rule sets, if you will) at scale. Nicholas Asher & Alex Lascarides (2011), Reasoning dynamically about what one says. Synthese volume 183, pages5–31, https://link.springer.com/article/10.1007/s11229-011-0016-4 Regardless of the underlying formalisms, the essential question to any symbolic or rule-based approach on natural language understanding is how to address that knowledge gap or how to avoid the problem. As far as I can see, there are basically four possible strategies: (a) develop a methodology for bootstrapping such information from unannotated data, (b) repurpose existing resources, esp., lexical data, rule sets or complete tools, (c) restrict yourself to a closed domain, or (d) integrate approaches on representation learning or self-supervision Modern NLP primarily goes into the latter direction. Hence the dominance of neural methods nowadays. Traditional, symbolic NLP was very much focusing on (c), but that largely withered away during the 1990s. During the era of statistic NLP, bootstrapping (a) was a popular strategy, but it's hard to arrive at high-quality data/rules/dictionary in this way. Most practical applications of NLP technology do (b), i.e., not to bother with building NLP tools in the first place, but just to customize them or their output. NB: As for UCCA and AMR, I'm not actually suggesting to use them as is, but maybe some of the data or tools developed for their parsing can be repurposed. > My demo on smart homes includes an example of default reasoning relating > to the lighting and heating of a room taking into account the > preferences of >who is in that room, see: > https://www.w3.org/Data/demos/chunks/home/ > > I look forward to other demos that implement the kind of reasoning > described by Johnson-Laird. These will be easier to implement if we have > a working >implementation of natural language for end to end > communication of meaning. Such demos could build upon a limited subset > of language, along with >manually developed declarative and procedural > knowledge. In other words, we don’t need to solve all of language to > build useful demos. Ok, so the overall goal would be to develop closed-domain solutions? That should be possible. > >> Complexity of symbolic parsing. Notoriously slow when it comes to >> larger dictionaries > > Can you please expand on that as it isn’t obvious to me. Perhaps this is > something to do with the kind of parsers they’ve used? Kind of. There are linear-time (O(n)) parsers, but achieving linear time means to limit context awareness and to ignore (or postpone) the resolution of ambiguities. Shift-reduce parsers with backtracking can be exponential. Between that, everything is possible, but realistically, the more expressive grammar formalisms are mildly context-sensitive and can be parsed in polynomial time (e.g., Öttl et al. 2015). So, we're talking about something between O(n^2) or O(n^3) and O(n^6) time complexity. Here, n is the length of the sentence, but in practice, the effect of the size of the ruleset (grammar+lexicon) has an immense impact, too (think of it as a constant multiplied with the base n). Large coverage requires large rulesets, so, this will work nicely for closed-domain applications but beyond that, you have a tradeoff between scalability and coverage -- or, if you take a more restricted parsing formalism, context-awareness. Birgit Öttl, Gerhard Jäger, Barbara Kaup (2015), Does Formal Complexity Reflect Cognitive Complexity? Investigating Aspects of the Chomsky Hierarchy in an Artificial Language Learning Study, PLOS ONE, https://doi.org/10.1371/journal.pone.0123059 Neural parsing circumvents the issue by reducing parsing to computations over a *finite* set of embeddings at every point in time (no lexicon!) -- but only to the extend that context information is faithfully maintained in these embeddings. Theoretically, that can be achieved with RNNs (for example), but in reality, the degree to which they preserve information about earlier states is limited by the numerical precision (that would be floats or half-floats). It is not possible to quantify the effect exactly and it depends on the architecture of the network and the characteristics of the data. And you end up with a blackbox, of course, not rules. > For human processing, you can measure the time someone takes to read an > utterance (e.g. with eye tracking), and see how the time changes with > different >kinds of utterance. I don’t have any pointers to such work to > hand, but expect that it would show effects on the level of embedding > and the complexity of >references within the utterance. Made up words > can be used to explore the reasoning involved in dealing with previously > unknown words. > >> Coverage of symbolic parsing. The best HPSG grammars for English cover >> maybe 85% of the input tokens > > Perhaps the grammars are too prescriptive? I think it's mostly out-of-vocabulary words. A(lmost a) non-issue in closed domains, but basically prohibitive for open-domain applications. I don't think there are any commercial applications around anymore based on symbolic parsing. Overall, the history of symbolic parsing points to the existence two major challenges: - How to acquire the necessary knowledge (rules, etc.) ? - How to make processing performant? Regardless as to whether CogAI solutions are based on these earlier lines of research or be developed from scratch, these are the challenges to be expected. For closed domains and demos, all can work nicely. For anything beyond that, we need to think how to address that. The approach of NLP has been to largely abandon symbolic parsing, but this development is less driven by scientific insight than by the prospect to trade human expertise against computing power (remember "Every time I fire a linguist ..."). In any case, I'm not sure I fully understand the chunk mechanism, so it's likely I miss something obvious. > Cognitive parsers should be able to make some sense of incomplete or > ungrammatical utterances. This also relates to the potential for > learning new >grammar. That would be ideal. How would that work in practice? Best, Christian
Received on Tuesday, 19 January 2021 17:29:59 UTC