Cognitive agents as collections of modules from Dave Raggett on 2024-03-11 (public-cogai@w3.org from March 2024)

From: Dave Raggett <dsr@w3.org>
Date: Mon, 11 Mar 2024 16:46:36 +0000
To: public-cogai <public-cogai@w3.org>
Message-Id: <A7937E99-1C3F-42CE-BDE4-D6062BF3CA19@w3.org>

This is a progress report on work I am doing on cognitive agents as collections of cognitive modules. Today’s large language models involve a stack of Transformers wedged between outer layers that deal with embeddings for tokens and positional information on the input side, and predicting the next token on the output side, where tokens are words, characters or some intermediary.

The brain is highly modular and it makes sense to explore a modular approach to artificial neural networks.

The modules all operate on working memory which I will treat as a vector that holds the latent semantics. This can be extended to a matrix for visual concepts. A vector is pretty flexible in that it can represent a single axis in a basis set, e.g. a given word, or a superposition of states, e.g. locations in a three dimensional space, or a chunk of name/value pairs or labelled directed edges in a graph.
The richer representations yield noisy results when accessed, necessitating denoising. I don't understand how this works in practive in current language models! However, the effectiveness of large language models, and text to image models, is evidence that it works good enough. Transformers lack the expressive power to properly handle checks on parity, matching nested brackets, etc., but given enough layers manage to do a good enough job on human language and programming scripts.

Encoder

This takes a sequence of tokens and constructs the latent semantics they imply. This involves self-attention and transformation.

Decoder

This generates a sequence of tokens from the latent semantics. The decoder updates the latent semantics with positional information as each token is generated. This involves self-attention and transformation.

Reasoner

This is a feedforward network that takes the latent semantics as its input, and provides an update to the latent semantics as its output. This is equivalent to a production rule engine for rules with a conjunction of conditions and a sequence of actions. Actions can also invoke functions on memory and external modules. Actions thus need to be able to describe which module they apply to, e.g. to trigger the decoder to output some text.

Memory

This is a vector database that uses a vector as a query, and then updates the latent semantics based upon the best match found in the database. Recall is stochastic based upon similarity and activation levels, where the level decays over time, but is boosted upon access. The module further supports updates to existing vectors, as well as adding and deleting vectors. The query thus needs to be accompanied by the requested operation.

Generative language models learn the semantics in order to predict the next word. Simply learning to regenerate the input text will fail to learn the semantics. However, the generative approach on natural language further requires the model to have lots of everyday knowledge, which necessitates a very large training dataset. Is there another way?

It should be feasible to train a system that integrates the above modules using a relatively modest dataset using restricted language and semantics. The idea is to synthesise a dataset with taxonomic knowledge, basic logic and sets, causal knowledge and temporal relations, as well as simple arithmetic. I am working on how to create the dataset using a script.

p.s. one of the challenges I am seeking help with is a means to collapse a superposition of states to a single state when you the vocabulary is not predetermined. This would allow a language model to generate a sequence of words, concurrently with another module that maps these words into characters or phonemes.

Dave Raggett <dsr@w3.org>

Received on Monday, 11 March 2024 16:46:49 UTC