Re: Beyond Transformers ...

Generative AI assumes that AI models need to be trained upon a representative dataset before being deployed. The AI models are based upon a frozen moment in time. By contrast, humans and other animals learn continually, and this is thought to be based upon continual prediction. In respect to language, this amounts to predicting the next word based upon the preceding words.

Transformer based language models use an explicit context that contains many thousands of preceding words. A promising alternative is to instead hold the context in an associative memory that maps cues to data. My hunch is that each layer in the abstraction stack can use its own associative memory for attention along with local learning rules based upon continual prediction in each layer, avoiding the need for biologically implausible back propagation across the layers.

Associative memory is uniquitous in the brain, yet we still don't have a full understanding of how it is implemented. In principle, this could use one or more layers to map the cues to probability distributions for the associated data vectors, enabling the use of argmax to determine the index into a table of data vectors. That suffers from the reduction to a one-hot encoding, i.e. each data vector is selected by a single neuron, which sounds error prone and very unlikely from a biological perspective.

Some interesting papers on this are:

Biological constraints on neural network models of cognitive function (2021), Pulvermüller et al.
 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7612527/pdf/EMS142176.pdf

Recurrent predictive coding models for associative memory employing covariance learning (2023), Tang et al. 
 https://journals.plos.org/ploscompbiol/article?id=10.1371%2Fjournal.pcbi.1010719

I am looking for the means to enable associative memory to support:

Retrieval with degraded or noisy cues
Stochastic selection: if there are multiple memories with very similar cues, the probability of retrieving any one of them depends on its level of activation
Single-shot storage rather than requiring repeated presentations of each cue/value pair
Short and long term memory with a model for boosting and decay
Minimisation of interference to ensure effective use of memory capacity, e.g. using sparse coding

What other papers should we be looking at?

Dave Raggett <dsr@w3.org>

Received on Monday, 16 September 2024 09:04:14 UTC