Re: AI KR , foundation models explained (talking about slippery things from Dave Raggett on 2024-06-08 (public-aikr@w3.org from June 2024)

From: Dave Raggett <dsr@w3.org>
Date: Sat, 8 Jun 2024 09:59:47 +0100
To: paoladimaio10@googlemail.com
Cc: W3C AIKR CG <public-aikr@w3.org>
Message-Id: <87A401CB-CF1C-443F-8A9D-4A55CE181A7E@w3.org>
Training foundation models for LLMs is kind of like getting them to learn about everything all at once, all mixed up. This  works thanks to the magic of gradient descent and back propagation, and addresses the challenge that understanding every day sentences requires a good grasp of common sense knowledge, creating a chicken and egg problem.

Back propagation is very slow (look at typical values for the learning rate) and very different from how humans learn. We are able to learn from single examples, and get by on a very tiny fraction of the data that LLMs require for foundation models. Chomsky referred to this as the “poverty of the stimulus”.

During childhood, our schooling introduces knowledge in a carefully organised approach with new knowledge layered on top of previously learnt knowledge.  Our grasp of common sense comes from a blend of everyday experience and what we are schooled.

In the last ten years AI has come a long way, but we are still to figure out how to mimic the economies of human learning. I am searching for the means for neural networks to memorise and generalise from sequences using single-shot learning. This means stepping away from back propagation to consider other, more biologically plausible approaches.  One paper that caught my eye combines slow learning for learning to learn, and fast learning for single-shot learning. In essence, this trains the network to learn quickly for a limited set of tasks.

Tomorrow’s AI will be very different from today’s as we gradually master quick learning and deliberative (Type 2) reasoning. Moreover, it will use a fraction of the power consumed by today’s energy hungry GPUs/TPUs.  Sparse spiking neural networks implemented with neuromorphic hardware will mimic the efficiency of the brain.  This is also likely to trigger a move away from back propagation.

There is a lot to look forward to.

Cheers,
    Dave

> On 8 Jun 2024, at 06:03, Paola Di Maio <paola.dimaio@gmail.com> wrote:
> 
> Okay, folks, I have been a bit AWOL, got lost in the dense forest of understanding following the AI KR path
> In related discussions, what are foundation models?
> 
> If you ask Google (exercise)  the answer points to FM in ML, starting with Stanford in 2018 etc etc etc
> https://hai.stanford.edu/news/what-foundation-model-explainer-non-experts
> Great resources to be found online, all pointing to ML and nobody actually showing you the FM
> is in a tangible form (I remember this happened a lot with SW)
> Apparently
> that FM are actually not an actual thing, they are not there at all,
>  they are like dynamic neural network architecture (no wonder they have been slippery all along) which is built by ingesting
> data on the internet
> 
> Foundation models are massive neural network-based architectures designed to process and generate human-like text. They are pre-trained on a substantial corpus of text data from the internet, allowing them to learn the intricacies of language, grammar, context, and patterns.
> 
> They are made of layers, heads and parameters
> 
> 
> Coming from systems engineering, you know, with a bit of an existential background, I am making the case
> that foundational models without ontological basis are actually the cause of much risk in AI
> 
> In case you people were wondering what I am up to, and would like to contribute to this work
> Please pitch in
> 
> Paola

Dave Raggett <dsr@w3.org>
Received on Saturday, 8 June 2024 08:59:59 UTC