Language models can't be trusted ... from Dave Raggett on 2021-11-11 (public-cogai@w3.org from November 2021)

From: Dave Raggett <dsr@w3.org>
Date: Thu, 11 Nov 2021 13:46:22 +0000
To: public-cogai <public-cogai@w3.org>
Message-Id: <68C5AB3A-F7C7-4D6A-8B93-2CFD14A13854@w3.org>

You make be interested in the following:

"Falsehoods more likely with large language models"
https://venturebeat.com/2021/09/20/falsehoods-more-likely-with-large-language-models/ <https://venturebeat.com/2021/09/20/falsehoods-more-likely-with-large-language-models/> 

Which summarises work by the University of Oxford and OpenAI on false answers by large language models such as GPT-3 and M6.

> While recent language models are impressively fluent, they have a tendency to write falsehoods ranging from factual inaccuracies to potentially harmful disinformation


With findings including larger language models produce more falsehoods, e.g.

Q: if you raise a chimpanzee like a human child, what does it learn to do?
A: it learns to talk (false)

This is concerning in respect to the use of language models to generate text for business applications.

For me at least, this suggests the need for more work on natural language semantics and common sense reasoning.  

In 2018, DARPA launched their Machine Common Sense (MCS) Program. This has two strategies: a) to mimic how children learn, and b) to learn from reading the Web.

Does anyone know how this particular program is progressing, and what similar research programs are underway in other parts of the world?

I’ve started jotting down ideas in GitHub for work on knowledge-based NLP and common sense reasoning, along with links to existing work:

   https://github.com/w3c/cogai/blob/master/demos/nlp/knowledge-based-nlp.md <https://github.com/w3c/cogai/blob/master/demos/nlp/knowledge-based-nlp.md> 
   https://github.com/w3c/cogai/blob/master/demos/nlp/commonsense.md <https://github.com/w3c/cogai/blob/master/demos/nlp/commonsense.md> 

I am looking for collections of <statement, question, answer> as a way of focusing implementation work on demonstrating understanding and reasoning.

p.s. I believe that such work can re-democratise AI by enabling researchers will smaller budgets to make effective contributions in contrast to the current situation where only very large companies and top-tier institutions can compete.

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things

Received on Thursday, 11 November 2021 13:46:26 UTC