Re: Constitutional AI?

> On 10 Jan 2023, at 04:56, Paola Di Maio <paola.dimaio@gmail.com> wrote:
> 
> Not saying anything in favour or against this paper but
> related to AI KR , hence worth a glance
> 
> Constitutional AI: Harmlessness from AI Feedback
> https://www.anthropic.com/constitutional.pdf

Thanks for the pointer.

The work starts by selecting an initial prompt from a dataset of harmful prompts. This is followed by prompts to request a self-critique, to request a revision, and then repeating the original harmful prompt to see if the response is an improvement on the original response. This process can be repeated with variations in the wording for the critique request and revision request. The results can later be used to refine the large language model to produce less harmful responses without the need for the critique request and revision request steps.

This relies on the large language model having sufficient understanding  to generate its own critiques and revisions.  However, large language model tend to be weak on reasoning. In particular, the deep learning architectures used in LLMs, don’t learn to emulate reasoning functions. Instead, they find clever ways to learn statistical features that inherently exist in the reasoning problems.

See: https://bdtechtalks.com/2022/06/27/large-language-models-logical-reasoning/

That isn’t surprising as deep learning struggles with compositional generalisability, relying on vast datasets in compensation.

See: https://royalsocietypublishing.org/doi/10.1098/rstb.2019.0307
And more generally: https://link.springer.com/article/10.1007/s13748-021-00239-1

Existing large language models support chain of thought reasoning, relying on working memory (the current neural activation values) to hold the context. There is no support for continuous learning and the equivalent of human short and long term memory. This is an opportunity for experimenting with new network architectures and training techniques that would reduce the need for vast datasets and better reflect what we know about human cognition.

Dave Raggett <dsr@w3.org>

Received on Tuesday, 10 January 2023 11:15:14 UTC