Continual learning as a stepping stone to AGI from Dave Raggett on 2024-08-20 (public-cogai@w3.org from August 2024)

From: Dave Raggett <dsr@w3.org>
Date: Tue, 20 Aug 2024 10:51:17 +0100
To: public-cogai <public-cogai@w3.org>
Message-Id: <96FCA95D-3D9F-4F63-A50C-337CD228E2A0@w3.org>

I’ve been carrying on my studies on continual learning with the aim of identifying new building blocks for AI that can take us closer to AGI. Back propagation with stochastic gradient descent is efficient, but forces a clear division between training and running AI models. I’ve therefore been reading about Hebbian learning and the idea of local learning rules that are closer to what we know about biological neurons. One paper that caught my eye is the following:

The combination of Hebbian and predictive plasticity learns invariant object representations in deep sensory networks <https://www.nature.com/articles/s41593-023-01460-y>, 2023

In essence, synaptic weights are adjusted based upon the neuron’s output, the synapse's input (from another neuron), and terms relating to rolling averages for the value, its slope and variance. These terms fix problems with the original form of Hebbian learning. Lateral competition can then be used to ensure that different neurons respond to different features, see, e.g.

Competitive Hebbian Learning: algorithm and demonstrations <https://sci-hub.se/10.1016/s0893-6080(05)80024-3>, 1991

A further approach supports learning on different time scales inspired by the millisecond timescale for chemical changes in synaptic connections versus very much longer timescales for growing new synaptic connections. Sparse connections can help to reduce task interference where learning a new task reduces performance on previously learned tasks.

Local learning rules is harder for networks with many layers as compared to using back propagation, given that the information provided needs to percolate through the layers. A way around that is to generate a local learning signal for each layer. This assumes continual prediction where previous data is used to predict the next value for comparison with the actual value. The error is then used in a local learning rule drawing inspiration from Hebbian learning.

Existing language models have a deep stack of layers with data propagating up the stack with the prompt input to the bottom-most layer and the response output from the top-most layer. Continual learning would enable a different architecture where input and output are shared functions of the same layer. As you learn from listening to others talking, you can directly apply that to your own speech generation. In this approach, prediction at a given layer is influenced by the layers immediately above and below it. The stack of layers now has input and output at the bottom-most layer, and latent semantics at the top-most layer.

I am also interested in how to support sequential cognition on top of this. That corresponds to a neural network that updates the latent semantics in a stepwise operation akin to a production rule engine. This requires a means to implement reinforcement learning that propagates backwards through the chain of steps taken to reach a given goal, e.g. using episodic memory. That suggests the need for work on neural network architectures for declarative and episodic memory.

To test these ideas, I need to look at how to apply continual prediction to simple sequences which can be generated algorithmically.

Dave Raggett <dsr@w3.org>

Received on Tuesday, 20 August 2024 09:51:30 UTC