Re: Something on grammar

Hi Christian,

Many thanks for your thoughtful comments and pointers.

> On 19 Jan 2021, at 17:28, Christian Chiarcos <christian.chiarcos@web.de> wrote:
> 
> Hi Dave,
> 
> Nicholas Asher & Alex Lascarides (2011), Reasoning dynamically about what one says. Synthese volume 183, pages5–31, https://link.springer.com/article/10.1007/s11229-011-0016-4 <https://link.springer.com/article/10.1007/s11229-011-0016-4>
> Regardless of the underlying formalisms, the essential question to any symbolic or rule-based approach on natural language understanding is how to address that knowledge gap or how to avoid the problem. As far as I can see, there are basically four possible strategies:
> 
> (a) develop a methodology for bootstrapping such information from unannotated data,
> (b) repurpose existing resources, esp., lexical data, rule sets or complete tools,
> (c) restrict yourself to a closed domain, or
> (d) integrate approaches on representation learning or self-supervision
> 
> Modern NLP primarily goes into the latter direction. Hence the dominance of neural methods nowadays. Traditional, symbolic NLP was very much focusing on (c), but that largely withered away during the 1990s. During the era of statistic NLP, bootstrapping (a) was a popular strategy, but it's hard to arrive at high-quality data/rules/dictionary in this way. Most practical applications of NLP technology do (b), i.e., not to bother with building NLP tools in the first place, but just to customize them or their output.

Thanks - I am mainly interested in meaning and the communication of meaning through language. This involves bootstrapping from a manually programmed core and then learning from experience in a way that mimics how children learn. For an accessible account, see:

“How Children Learn Language”, William O’Grady, 2005, Cambridge University Press.
https://doi.org/10.1017/CBO9780511791192 <https://doi.org/10.1017/CBO9780511791192> 

O’Grady notes that there is still a great deal to learn about how children are able to acquire language so easily. He provides a collection of techniques, but doesn’t describe a computational account, which is something I want to explore. In principle, cognitive agents could be taught lessons using a relatively simple subset of language, and that language competence could be acquired over the course of such lessons.

As far as I am aware current approaches to applying artificial neural networks (e.g. BERT, GPT-3) focus on syntax with meaning indirectly expressed in terms of statistics over the co-occurrence of words. That doesn’t get us very far when it comes to reasoning about meanings.

> 
>>  Complexity of symbolic parsing. Notoriously slow when it comes to larger dictionaries
> 
> Can you please expand on that as it isn’t obvious to me. Perhaps this is something to do with the kind of parsers they’ve used?  
> 
> Kind of. There are linear-time (O(n)) parsers, but achieving linear time means to limit context awareness and to ignore (or postpone) the resolution of ambiguities. Shift-reduce parsers with backtracking can be exponential. Between that, everything is possible, but realistically, the more expressive grammar formalisms are mildly context-sensitive and can be parsed in polynomial time (e.g.,  Öttl et al. 2015). So, we're talking about something between O(n^2) or O(n^3) and O(n^6) time complexity. Here, n is the length of the sentence, but in practice, the effect of the size of the ruleset (grammar+lexicon)  has an immense impact, too (think of it as a constant multiplied with the base n). Large coverage requires large rulesets, so, this will work nicely for closed-domain applications but beyond that, you have a tradeoff between scalability and coverage -- or, if you take a more restricted parsing formalism, context-awareness. 

I think you’re describing the family of statistical phrase structure parsers which make heavy use of backtracking, and use statistics to rank different parse trees given the syntactic ambiguities that are so prevalent in human languages. This backtracking can be avoided by incremental concurrent processing of syntax and semantics, where syntax guilds semantics and vice versa. This is something I am trying to realise.

As an example, one step is to identify the part of speech and word sense for a word given the current state of processing an utterance together with the next word. This can take the statistics for part of speech sequences into account, along with the meaning of the previous words, and semantic models of meaning using spreading activation as originally proposed by Quillian.

Here is an example of previous work on this using ACT-R:

 “A cognitive approach to word sense disambiguation”, Sudakshina Dutta & Anupam Basu
 https://link.springer.com/chapter/10.1007/978-3-642-28604-9_18 <https://link.springer.com/chapter/10.1007/978-3-642-28604-9_18> 

And here is an account of top-down syntactic parsing using ACT-R:

  “The Basics of Syntactic Parsing in ACT-R”, Adrian Brasoveanu & Jakub Dotlačil
 https://link.springer.com/chapter/10.1007/978-3-030-31846-8_3 <https://link.springer.com/chapter/10.1007/978-3-030-31846-8_3> 

I instead use a bottom up approach using shift-reduce parsing that incrementally processes an utterance, word by word to construct a graph of chunks, where each chunk represents a verb phrase, a noun phrase, etc. This year I want to work on integrating incremental semantic processing to construct the meaning of the utterance, and to guide parsing, e.g. attachment of prepositional phrases. I also need to work on the lexicon and integration of spreading activation.

Processing time will depend on the extent of semantic ambiguities, but in a much more manageable way than for purely syntactic parsing. Humans don’t seem to have much of a problem with long sentences provided that the meaning is clear and the structure isn’t deeply nested. I suspect that the latter is due to constraints of working memory.

> Perhaps the grammars are too prescriptive?  
> 
> I think it's mostly out-of-vocabulary words. A(lmost a) non-issue in closed domains, but basically prohibitive for open-domain applications. I don't think there are any commercial applications around anymore based on symbolic parsing.

Microsoft Tay comes to mind as an example of how self-learning applications can go badly wrong with out of vocabulary words, and start spouting racist or misogynistic nonsense.

I want to support out-of-vocabulary words in a way that reflects how children are seen to deal with them using the syntactic and semantic context. This is where combined symbolic+statistical approaches will be helpful, along with the means to trigger reasoning by pushing chunks to the model of the cortico-basal ganglia circuit. This offers the choice of modelling both conscious and subconscious reasoning in respect to syntax and semantics.

> Overall, the history of symbolic parsing points to the existence two major challenges: 
> - How to acquire the necessary knowledge (rules, etc.) ?
> - How to make processing performant?

Applications for modest closed domains are quite practical using manually developed knowledge, but to go beyond that we need to model how humans learn in terms of a toolkit of techniques, including metacognition.

Processing speed is a lesser challenge. Concurrent syntactic and semantic processing makes parsing relatively easy. Graph algorithms can be speeded up using hardware acceleration, and rule conditions can be compiled into discrimination networks that mimic sub-cortical regions of the brain.

> Regardless as to whether CogAI solutions are based on these earlier lines of research or be developed from scratch, these are the challenges to be expected. For closed domains and demos, all can work nicely. For anything  beyond that, we need to think how to address that. The approach of NLP has been to largely abandon symbolic parsing, but this development is less driven by scientific insight than by the prospect to trade human expertise against computing power (remember "Every time I fire a linguist ...").

We have seen symbolic approaches replaced by statistical approaches that are free of semantics, and now we need to figure out how to focus on human-style semantic reasoning and to combine symbolic and statistical approaches. This is where there is a great deal to gain from the work across the cognitive sciences.

> In any case, I'm not sure I fully understand the chunk mechanism, so it's likely I miss something obvious. 

I’d be happy to answer any questions you have. If you have the interest and time, there is an extensive body of literature on ACT-R.

> 
> Cognitive parsers should be able to make some sense of incomplete or ungrammatical utterances. This also relates to the potential for learning new grammar.
> 
> That would be ideal. How would that work in practice?

The shift-reduce parser is essentially bottom up and the grammar is implicit in the shift-reduce rules. The syntax-semantics mapping rules try to make sense out of what phrase structure is constructed. This in turn is subject to higher level reasoning in the context of the agent’s current goals.

What’s much less clear is how the agent learns new grammar as it involves extensions to the lexicon, shift-reduce rules and the syntax-semantic rules. I am looking for ways to apply weakly supervised learning of new rules from even single examples, along with means to generalise or specialise competing hypotheses. The way that children learn should provide rich insights as to what is needed, along with a suite of heuristics to apply.

Kind regards,

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things 

Received on Wednesday, 20 January 2021 11:38:27 UTC