Deep Learning for AI from Dave Raggett on 2021-11-03 (public-cogai@w3.org from November 2021)

From: Dave Raggett <dsr@w3.org>
Date: Wed, 3 Nov 2021 11:40:58 +0000
To: public-cogai <public-cogai@w3.org>
Message-Id: <EEE42EB7-C3A6-4427-ACFF-739F4A26E29E@w3.org>
I am interested in your comments on the following overview article on deep learning for AI by Yoshua Bengio, Yann Lecun, Geoffrey Hinton, Communications of the ACM, July 2021, Vol. 64 No. 7.

> Comparing human learning abilities with current AI suggests several directions for improvement:
> 
> • Supervised learning requires too much labeled data and model-free reinforcement learning requires far too many trials. Humans seem to be able to generalize well with far less experience.
> • Current systems are not as robust to changes in distribution as humans, who can quickly adapt to such changes with very few examples.
> • Current deep learning is most successful at perception tasks and generally what are called system 1 tasks. Using deep learning for system 2 tasks that require a deliberate sequence of steps is an exciting area that is still in its infancy.


https://cacm.acm.org/magazines/2021/7/253464-deep-learning-for-ai/fulltext <https://cacm.acm.org/magazines/2021/7/253464-deep-learning-for-ai/fulltext> 

It provides an insider’s perspective on progress and trends, but doesn’t say much about the flaws as seen by outsiders, nor about ethical challenges such as dealing with bias and explainability.

It also fails to cite existing work on combining symbolic and sub-symbolic approaches, including work on System 2, e.g. ACT-R. In my opinion, there is a lot of potential for relating symbolic representations to vector representations, and that this could provide valuable insights for richer neural network architectures, especially in respect to System 2.

Some points that caught my eye:

> How can we design future machine learning systems with the ability to generalize better or adapt faster to out-of-distribution data?
> 
> Evidence from neuroscience suggests that groups of nearby neurons (forming what is called a hyper-column) are tightly connected and might represent a kind of higher-level vector-valued unit able to send not just a scalar quantity but rather a set of coordinated values.
> 
> Most neural nets only have two timescales: the weights adapt slowly over many examples and the activities adapt rapidly changing with each new input. Adding an overlay of rapidly adapting and rapidly, decaying "fast weights" introduces interesting new computational abilities. … Multiple time scales of adaption also arise in learning to learn, or meta-learning.
> 
> When thinking about a new challenge, such as driving in a city with unusual traffic rules, or even imagining driving a vehicle on the moon, we can take advantage of pieces of knowledge and generic skills we have already mastered and recombine them dynamically in new ways. This form of systematic generalization allows humans to generalize fairly well in contexts that are very unlikely under their training distribution. We can then further improve with practice, fine-tuning and compiling these new skills so they do not need conscious attention anymore. How could we endow neural networks with the ability to adapt quickly to new settings by mostly reusing already known pieces of knowledge, thus avoiding interference with known skills?
> 
> The ability of young children to perform causal discovery suggests this may be a basic property of the human brain, and recent work suggests that optimizing out-of-distribution generalization under interventional changes can be used to train neural networks to discover causal dependencies or causal variables. How should we structure and train neural nets so they can capture these underlying causal properties of the world?


What do you think?

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things
Received on Wednesday, 3 November 2021 11:41:02 UTC