What's next after self-attention?

I trust everyone has had a pleasant break.  This is a short email to let you what I am working on in respect to the Cognitive AI CG.

The previous work we did on chunks and rules has proved to be relevant to low-code applications for real-time control of digital twins, see: https://w3c.github.io/cogai/chunks-and-rules.html

This exploits explicit facts and rules as a basis for describing concurrent threads of behaviour. Looking to the future, we should expect to see such explicit programming replaced by the means to explain and show what you want to cognitive agents, just as you would when describing a task to a human colleague.  Cognitive agents will be helpful when it comes to curating use cases, including the problematic edge cases that plague conventional programming. Such agents will be able to manage the regression testing to ensure that any changes don’t mess things up.

There is a lot of current interest in how Generative AI can be applied to agents, but I want to look further out to what I am calling Sentient AI.  This features continual learning and reasoning, making use of continual prediction just like human cognition. Generative AI by contrast requires a clear division between training and run-time.  Moreover, it is challenging to update previously trained knowledge, and training on new tasks leads to catastrophic forgetting on previously trained tasks. Sentient AI as the name suggests enables agents that are aware of their environment, goals and performance, including the means to reflect on their past experiences.

Human cognition is very different from today’s Generative AI.  Human vision for instance is limited to around a two degree field of view for high acuity, and this takes up over 50% of the visual cortex. This corresponds to the width of a thumb nail at a typical reading distance. When it comes to reading we can only see a small number of characters at any one time, in stark contrast to large language models with context windows of tens of thousands of tokens. This means that human cognition is reliant on memory in lieu of direct access to sensory data.

Generative AI is dependent on Transformers. Each layer combines self-attention with feed forward processing. Self-attention requires direct access to all of the positions in the context window and scales as the square of the context width. How can human cognition work without direct access to sensory data?  The answer must involve the means to query and update short term memory along the equivalent of transformers to map between different abstraction levels.

My hunch is that we are essentially using continual prediction in each layer to provide a local training signal given that stochastic gradient descent is very implausible as a model of the brain. I am now trying to explore the design space for sensory cognition, including sequence learning. Here are just a few of the questions that arise:

Does memory need to resolve to a single trace or is a supposition of traces a better fit to the requirements? The former could involve Hopfield networks with an iteration to find the best matching trace as a minimum in the Lagrangian energy space.
Is the memory specific to each layer or shared across layers?
Are transformations an integral part of memory or a complementary system?
How are transformations expressed?
How are queries expressed and used to identify the transformations to apply?
Can multiple complementary transformations be applied in parallel?
How are slot fillers recorded so that something is only used once as a filler?
How can we use local-learning rules as an alternative to gradient descent?
Can feedbackward connections provide a basis for context dependencies?
How can we split learning into single-shot memorisation and learning-to-learn?
Is reinforcement learning required, and if so how do we back propagate the reward/penalty?

I will provide status reports as I progress. Your help is welcomed!

Dave Raggett <dsr@w3.org>

Received on Wednesday, 8 January 2025 16:23:20 UTC