Re: What's next after self-attention?

On Tue, 14 Jan 2025 15:39:15 +0000, Dave Raggett <dsr@w3.org> wrote:

>  including the means to develop small AI systems that are a better fit to their intended applications as compared to the > latest large language models.

I agree if we focus on a specific problem(s) we are likely to harness more directed models that do not require as much resource




> 
> A starting point is to explore how memory can replace the need for large context windows for sequence learning. You can imagine this as an engine that processes tokens one by one. On each step a cue is provided to retrieve data from memory for use in the next step. The next cue is generated by a transformation of the current memory output. This could use a similar approach to Transformers, i.e. a multi-headed attention mechanism followed by an MLP.  The memory operations combine query and update.


> A further refinement would be to repurpose the input layers when you want to use the model to generate output.  Conventional large language models rely on feed forward connections. Imagine folding the top half of the transformer stack back on itself.  This necessitates feed backward connections. Continual learning then enables the model to mimic the input statistics when running in generation mode, akin to children quickly picking up the informal language patterns of their peers.

Can you help me understand what you are saying here better about the folding back upon itself. I can imagine 
feed backward approaches like multi-shot prompting.


> 
> Recent work on large language models has shown the potential for quiet thinking, i.e. thinking a while on a problem rather than responding immediately.  In principle, this should produce better results and reduce the likelihood of hallucinations.  For humans it amounts to the difference between working something out step by step versus making a wild guess under pressure when asked a question.

Personally I think this is based on expectations. As LLM excitement transitions into agentic aspirations people will both expect more but be more tolerant of iterations to get the answers with a higher fidelity. I am mulling over what a hallucination means.  hallucinations are inherent to generative models because of how they generate responses.

> 
> Quiet thinking corresponds to applying a sequence of transformations to the latent semantics across multiple layers in a neural network.  As such is it is similar to the processing needed for both sequence understanding and generation.  Can we design a neural network to support quiet thinking in addition to sequence learning, understanding and generation?

Can you help me understand what you mean by latent semantics here?  I think about it meaning context, like how I know whether you mean jaguar the car manufacturer or the animal based on context or even the capital "J" for the manufacturer.

> 
> The main difference is the need to support reinforcement learning over multiple steps. This is where we need episodic memory. However the details are far from clear.  

When I read this, I immediately jump to think of named graphs for episodic memory and techniques I can use to include or not include them in a query.


>Can we use the same memory for reinforcement learning and sequence learning?  How is the task reward propagated backward through time give the transformer inspired model of cognition?

I am not sure I follow completely. What comes to mind is if I regard something as successful the "training" examples I create are put in named graphs i add a +1 credibility / veracity score to. 


Ronald P. Reck

http://www.rrecktek.com - http://www.ronaldreck.com

Received on Wednesday, 15 January 2025 14:55:15 UTC