Re: What's next after self-attention?

On Wed, 8 Jan 2025 16:23:08 +0000, Dave Raggett <dsr@w3.org> wrote:

> My hunch is that we are essentially using continual prediction in each layer to provide a local training signal given that stochastic gradient descent is very implausible as a model of the brain. I am now trying to explore the design space for sensory cognition, including sequence learning. Here are just a few of the questions that arise:

I am wondering if our best approach forward is based on modeling the human brain's approach or focusing on
solving the problems piecemeal based on defined requirements.

> 
> Does memory need to resolve to a single trace or is a supposition of traces a better fit to the requirements? The former could involve Hopfield networks with an iteration to find the best matching trace as a minimum in the Lagrangian energy space.


these are very thoughtful questions. 

> Is the memory specific to each layer or shared across layers?

I would venture they are not shared across layers as the representations at each layer may be different.

> Are transformations an integral part of memory or a complementary system?


> How are transformations expressed?
> How are queries expressed and used to identify the transformations to apply?

I personally would initially assume they are different at each  layer.

> Can multiple complementary transformations be applied in parallel?

Yes

> How are slot fillers recorded so that something is only used once as a filler?

Wow, What great question, this is especially relevant if it is in parallel. 



Ronald P. Reck

http://www.rrecktek.com - http://www.ronaldreck.com

Received on Friday, 10 January 2025 13:40:43 UTC