Cognitive AI, sparse neural networks and mimicry

I came across some interesting papers from California-based Numenta who are enthusiastic about more accurate models of the brain as the key to advancing machine intelligence.

They show that sparsely connected neural networks can operate dramatically more efficiently compared to densely connected neural networks. Their implementation uses field programmable gate arrays (FPGAs) and NVIDIA is said to be experimenting with architectural changes to GPUs to better support sparse networks.

Numenta describe an approach involving sparse connections and sparse activation. You can think of this in terms of a sequence of feed-forward layers that each contain a set of nodes with inputs from the previous layer, and outputs to the next layer. For sparse connections, the output of each node in layer n is connected to the inputs of a randomly chosen small subset of the nodes in layer n+1. For sparse activation, the outputs from the nodes in a layer are zeroed apart from the top-k active nodes. They use back propagation for training. For more details, see:

    https://numenta.com/assets/pdf/research-publications/papers/Sparsity-Enables-50x-Performance-Acceleration-Deep-Learning-Networks.pdf

A second paper models columns of cells in the neocortex, and discusses the role of grid cells and place cells. Grid cells fire regularly as an animal moves across the lines of an invisible grid based on sensory-motor input. Different grid cells will have a different grid, i.e. different spacing and alignment. Several such grid cells taken together define a spatial coordinate system. Place cells fire in recognition of specific locations, e.g. landmarks in the local environment. The coordinate system is unique to each environment according to how grid cells “anchor” to that environment.

Numenta propose that neocortical areas learn models of objects in a similar way to how grid cells and place cells learn models of the environment. Compositional models of physical objects can be expressed in terms of mappings between locations in different spaces, corresponding to the different components of an object, e.g. the torso and arms of a person. The paper talks about sets of displacement cells that together represent displacement vectors, and their application to modelling the structural behaviour of objects, e.g. how an arm can move relative to the torso.

They go on to present what they call the “thousand brains theory of intelligence”. This holds that the brain creates many models of objects using different subsets of sensory arrays, and across different sensory modalities. Every cortical column learns models of complete objects. They achieve this by combining input with a grid cell-derived location, and then integrating over movements. Long range connections across multiple columns act to select the most likely model given all the uncertainties.

> For example, there is no single model of a coffee cup that includes what a cup feels like and looks like. Instead there are 100s of models of a cup. Each model is based on a unique subset of sensory input within different sensory modalities. There will be multiple models based on visual input and multiple models based on somatosensory input. Each model can infer the cup on its own by observing input over movements of its associated sensors. However, long-range non-hierarchical connections allow the models to rapidly reach a consensus of the identity of the underlying object, often in a single sensation.


They are hoping to extend this work to support continuous learning, and to apply it to transformer models, as used in natural language processing. For more details, see:

    https://www.frontiersin.org/articles/10.3389/fncir.2018.00121/full

I am wondering how all this relates to functional models of the brain and integration with symbolic approaches. Chris Eliasmith’s concept of “semantic pointers” involves the use of circular convolution as a means to represent chunks as vectors. That should be compatible with sparse connectivity and displacement vectors. Long-range non-hierarchical connections is also consistent with Sharon Thompson-Schill’s ideas of hub-and-spoke models for how the anterior temporal lobe integrates unimodal information from different cortical regions. This points to ideas on using streaming protocols between cognitive databases for efficient distributed graph algorithms.

Taking this further, we can begin to consider how to model social mimicry: How do babies learn to smile from interacting with their mothers?  How do children learn to mimic the speech sounds of their peers (including regional accents)?  How do we learn to speak complex utterances from listening to others? Further examples include copying the dance movements of others on the dance floor, and playing a piece of music on a piano or guitar after listening to it.

These all have a common underlying architecture. First, an internal model has to be learned from lower level sensory data. Second, you have learn how to map this internal model to a lower level model for motor control, for execution by the cerebellum. The statistics for recognition of patterns is shared with generation, e.g. shared across natural language understanding and generation, and may involve multiple stages in a pipeline.

In principle, this could be implemented using Transformer models and Deep Learning, but that would necessitate huge amounts of training data. Humans, however, get by with little data, so a different approach is called for that is incremental and involves only weak supervision. This is where the “thousand brains theory of intelligence” seems promising. Can we find a means to generate many models and allow them to compete for a consensus in terms of concurrent asynchronous processes?  

I am hoping to explore this in practical detail next year in respect to demos on inductive learning, and natural language processing. Contact me directly if you are interested in helping with that.

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things 

Received on Saturday, 14 November 2020 12:28:55 UTC