Vector embeddings for production rules from Dave Raggett on 2023-12-18 (public-cogai@w3.org from December 2023)

From: Dave Raggett <dsr@w3.org>
Date: Mon, 18 Dec 2023 15:44:34 +0000
To: public-cogai <public-cogai@w3.org>
Message-Id: <65A44356-11CF-44B0-A234-78C931843498@w3.org>

I’ve been studying ideas for implementing Type 1 & 2 cognition based upon vector embeddings for production rules. This is inspired by the role of production rules in symbolic cognitive architectures such as ACT-R and SOAR, as well as this community group's work on chunks & rules.

Some key papers include:

1) Neural machine translation by jointly learning to align and translate, 2015, Bahhanau, Cho and Bengio, see: https://arxiv.org/abs/1409.047

2) Attention is all you need, 2017, Ashish Vaswani et al., see: https://arxiv.org/abs/1706.03762

3) Neural Production Systems, March 2022, Aniket Didolkar et al, see: https://arxiv.org/pdf/2103.01937.pdf

A production rule system determines which rules match the current state of working memory, stochastically selects the best matching rule, and applies it to update working memory. Rules include variables as a basis for generalisation. An artificial neural network can be designed to learn rules through reinforcement learning.

The first reference above describes how English can be translated to French using a mechanism to determine the soft-alignment of the current word with the preceding and following words. The second reference introduces Transformers as model of self-attention that can be executed in parallel, and forms the basis for today’s large language models (e.g. ChatGPT) which statistically predict text continuations to a user supplied prompt. The third reference extends these ideas to show how attention supports the process of matching rule conditions to working memory.

I am hoping to apply aspects of all three papers in new work on applying production rules to Type 2 cognition, i.e. sequential deliberative cognitive steps as a basis for reasoning. This can be thought of as reimplementing chunks & rules in neural networks. This will exploit feed-backward connections for retained state in combination with the feed-forward connections found in existing language models. I am looking forward to implementing experimental versions of these ideas in PyTorch.

Any offers of help would of course be very welcome!

p.s. this is part of a roadmap for work including natural language processing and continual learning based upon integrating episodic and encyclopaedic memory.

Best regards,

Dave Raggett <dsr@w3.org>

Received on Monday, 18 December 2023 15:44:48 UTC