Different kinds of memory

Unfortunately our current AI technology doesn’t support continual learning, limiting large language models to the datasets they were trained with. An LLM trained back in 2023 won’t know what’s happened in 2024, and retraining is very expensive. There are work arounds, e.g. retrieval augmented generation (RAG) where the LLM is prompted using information retrieved from a database that matches the user’s request. However, this mechanism has its limitations.

For the next generation of AI we would like to support continual learning, so that AI systems can remain up to date, and moreover, learn new skills as needed for different applications through a process of observation, instruction and experience. To better understand what’s needed it is worth looking at the different kinds of human memory. 

Sensory memory is short lived, e.g. the phonological loop is limited to about one to two seconds. This is what allows us to replay in our heads what someone just said to us. Short term memory is said to be up to around 30 seconds with limited capacity. Long term memory is indefinite in duration and capacity. Humans are also good at learning from single observations / episodes. How can all this be realised as artificial neural networks?

Generative AI relies on back propagation for gradient descent, but this is slow as can be seen from the typical learning rate parameters. It certainly won’t be effective for single shot learning.  Moreover it doesn’t apply to sparse spiking neural networks which aren’t differentiable.  Alternative approaches use local learning rules, e.g. variations on Hebbian learning where the synaptic weights are updated based upon correlations between the neuron’s inputs and output.

One approach to implementing a model of the phonological loop is as a shared vector space where items from a given vocabulary are encoded with their temporal position, which can also be used as a cue for recall.  Memory traces fade with time unless reinforced by replay. In essence, this treats memory as a sum over traces where each trace is a circular convolution of the item and its temporal position.  The vectors for temporal positions should be orthogonal.  Trace retrieval will be noisy, but that can be addressed through selecting the strongest matching vocabulary item.  This could be considered in terms of vectors representing a probability distribution over vocabulary items.

A modified Hebbian learning rule can be used to update the synaptic weights so that on each cycle, the updated weight on each cycle pays more attention to the new information than to old information. Over successive cycles, old traces become weaker and harder to recall, unless boosted by replay. This requires a means to generate an orthogonal sequence of temporal position vectors. The sequence would repeat at an interval much longer than the duration of the phonological loop.

The next challenge is to generalise this to short and long term memory stores. A key difference to the phonological loop is that we can remember many sequences. This implies a combination of context and temporal sequence.  Transferring a sequence from sensory memory (the phonological loop) to short and long term memory will involve re-encoding memory traces with the context and a local time sequence.

This leaves many questions. What determines the context?  How can memories be recalled? How are sequences bounded? How can sequences be compressed in terms of sub-sequences?  How can sequences be generalised to support language processing?  How does this relate more generally to episodic memory as the memory of everyday events?

I now hope to get a concrete feel for some of these challenges, starting with implementing a simple model of the phonological loop. If anyone wants to help please get in touch. I am hoping to develop this as a web-based demo that runs in the browser.

Best regards,

Dave Raggett <dsr@w3.org>

Received on Wednesday, 10 July 2024 14:16:03 UTC