Language models and reasoning

A number of large language models have been recently announced that claim to incorporate reasoning:

Meta's Galactica[1] is a family of large language models trained on scientific texts, see the Galactica Explorer[2]. The website is full of hype, e.g. claiming to support reasoning, and the project has had uniformly bad reviews, e.g. "Is this really what AI has come to, automatically mixing reality with bullshit so finely we can no longer recognize the difference?" and “What bothers me so much about Facebook’s Galactica … is that it pretends to be a portal to knowledge … Actually it’s just a random bullshit generator.”, see the post by Alberto Romero[3].

That matches my expectations as large language models and image generators are designed to stochastically generate plausible output following the statistics of the style selected by the prompt. The authors claim that Galactica does better than other large language models at mathematical reasoning with the exception of Minerva. Galactica is also positioned as a scientifically literate search engine, but is let down by its tendency to generate bogus text that appears authentic and highly confident.

Google’s Minerva [4] is built on top of a large language model (Google PaLM) that was further trained on technical datasets. It correctly answers around a third of undergraduate level problems involving quantitative reasoning. However, it lacks a means to verify the correctness of the proposed solutions, as it is limited to intuitive reasoning.

It works best when the prompt is given as one or more questions plus worked answers, followed by the question for Minerva to answer. Google refers to this as chain of thought prompting. This presumably provides semantic priming on the desired style of answer, analogous to keywords such as "anime, Ghibli style" for image generators like Stable Diffusion.  Minerva demonstrates an ability to transform mathematical expressions from step to step, along with being able to carry out basic arithmetic operations.

I think it is time to abandon the idiom of statistically generating text continuations to a prompt, and to instead focus on sequential deliberative reasoning that is open to introspection.  One potential way forward is to enable sequential operations on latent semantics as obtained by applying large language models to text utterances. This relates to the sequence to sequence models used for language translation, in respect to being used for mapping the latent semantics to a symbolic language that can be used to describe operations and their results.

The activation levels for the neurons in upper layers of the artificial neural network, for the large language model, corresponds to working memory. This is by a text prompt. A sequential rule engine then manipulates working memory via a second network model, before generating the text output that corresponds to the updated latent semantics.  I haven’t implemented this as yet, and would like to collaborate with other people on this.  The DistilBERT large language model [5] is quite modest in size (e.g. 110 million parameters for the distilled base version of BERT), and as such avoids the need for the huge computing platforms available to well resourced companies.

Anyone interested? 

[1] https://galactica.org/static/paper.pdf
[2] https://galactica.org/explore/
[3] https://towardsdatascience.com/galactica-what-dangerous-ai-looks-like-f31366438ca6
[4] https://minerva-demo.github.io/#category=Algebra&index=1
[5] https://huggingface.co/distilbert-base-uncased

Dave Raggett <dsr@w3.org>

Received on Wednesday, 23 November 2022 10:00:12 UTC