Re: What's next after self-attention?

> On 10 Jan 2025, at 13:40, Ronald Reck <rreck@rrecktek.com> wrote:
> 
> I am wondering if our best approach forward is based on modeling the human brain's approach or focusing on solving the problems piecemeal based on defined requirements.

Lots of people are working on incremental extensions to current approaches to Generative AI, but it makes sense to look further out for new ideas where it should be easier to make a strong contribution. There is a great deal to be learned from the brain and the hundreds of millions of years of evolution behind it. The knowledge we gain will also give us inspiration to play around with different ideas, including the means to develop small AI systems that are a better fit to their intended applications as compared to the latest large language models.

A starting point is to explore how memory can replace the need for large context windows for sequence learning. You can imagine this as an engine that processes tokens one by one. On each step a cue is provided to retrieve data from memory for use in the next step. The next cue is generated by a transformation of the current memory output. This could use a similar approach to Transformers, i.e. a multi-headed attention mechanism followed by an MLP.  The memory operations combine query and update.

This needs generalising to support multiple layers, along with a means to generate an error signal local to each layer based upon comparing a retained prediction of the next state with the actual state.  Feed backward connections could be used to enable learned abstractions to support context dependencies, complementing memories at the same level of abstraction.

A further refinement would be to repurpose the input layers when you want to use the model to generate output.  Conventional large language models rely on feed forward connections. Imagine folding the top half of the transformer stack back on itself.  This necessitates feed backward connections. Continual learning then enables the model to mimic the input statistics when running in generation mode, akin to children quickly picking up the informal language patterns of their peers.

Recent work on large language models has shown the potential for quiet thinking, i.e. thinking a while on a problem rather than responding immediately.  In principle, this should produce better results and reduce the likelihood of hallucinations.  For humans it amounts to the difference between working something out step by step versus making a wild guess under pressure when asked a question.

Quiet thinking corresponds to applying a sequence of transformations to the latent semantics across multiple layers in a neural network.  As such is it is similar to the processing needed for both sequence understanding and generation.  Can we design a neural network to support quiet thinking in addition to sequence learning, understanding and generation?

The main difference is the need to support reinforcement learning over multiple steps. This is where we need episodic memory. However the details are far from clear.  Can we use the same memory for reinforcement learning and sequence learning?  How is the task reward propagated backward through time give the transformer inspired model of cognition?

What do you think?

Dave Raggett <dsr@w3.org>

Received on Tuesday, 14 January 2025 15:39:27 UTC