Re: AI catfishing [was Re: ChatGPT and ontologies]

There is a class of machine learning models where one part of the model 
challenges the other (generative adversarial network or short GAN). 
Letting one model detect the content of another is just like an offline 
GAN model. In this race of LLM vs. LLM detector, the more up-to-date one 
will be ahead for a moment, but once they are close enough, you will 
need an endless amount of text to distinguish signal from noise.

In the end, we must apply the same rules we use to distinguish if person 
A or person B wrote something. The things I have in mind are chain of 
trust and state of knowledge.

Am 18.02.23 um 13:18 schrieb Hugh Glaser:
> Thanks David,
> 
> Then I guess the answer to my question is “No, no-one here knows anyone who has tried using LLMs such as GPT-3 to find out if text is human- or machine-generated”
> 
> FWIW, I was thinking about at least a couple of ways of doing it.
> Firstly, systems could be directly trained; I think many people have been surprised at how functional the LLMs have been - maybe people would be surprised at how functional such detectors could be; I think this is like carrying forward spam detection-like processes.
> Secondly, the normal GPT-3s etc., could be used as-is, fed prompts of material, and asked if the author is human. This is the sort of thing I was thinking of; and improvements in generating tech would then be naturally tracked by the consequent improvement in detection.
> 
> Interestingly, I see the launch of these LLMs as something of a singularity.
> In a few years (months?) it will be interesting to try and find large training sets where you are confident that you know whether the authors are human or LLM.
> I can see an ouroboros of LLMs being trained on each other, and even themselves.
> Training sets that predate this, or are guaranteed one or the other, while being sufficiently large, will be valuable.
> 
> BTW.
> Spam detection is not the preserve of big, well-funded business - some of the best stuff is very much not from those sources.
> 
> Cheers
> Hugh
> 
>> On 17 Feb 2023, at 22:44, David Booth <david@dbooth.org> wrote:
>>
>> On 2/17/23 11:34, Hugh Glaser wrote:
>>> I disagree, David.
>>> The Spam-fighting arms race is an example of huge success on the
>>> part of the defenders.
>>
>> Very good point.  I guess I didn't adequately qualify my spam comparison.  Spam fighting has had a lot of success, however:
>>
>> - Spam is generally trying to get you to click on an easily identifiable link, or selling a very specific product.  That's inherently MUCH easier to detect than deciding whether a message was written by a human vs a bot (as Patrick Logan also pointed out).
>>
>> - Spam-fighting is MUCH better funded than your random spammer.  Think Google.  AI-generated influence messages -- including harmful disinformation -- will come from well funded organizations/adversaries.
>>
>> - When one spam message gets through the spam filters, it generally causes very little harm -- a minor annoyance.  But if one AI-generated spear phishing campaign succeeds, or if an AI-generated propaganda campaign succeeds, the consequences can be grave.
>>
>> So although spam fighting has had success, I don't see that success carrying over to distinguishing AI-generated content from human-generated content.  I think the continuing failure, of big social media companies (think Facebook and Twitter), to automatically distinguish human posts from bot posts, is already evidence of how hard it is to detect.  As AI improves I only expect the problem to get worse, because a well-funded adversary has two inherent advantages:
>>
>> - When it is so cheap to generate fake content, even if only a small fraction gets past the fake-detection filters, that can still be a large quantity, and still harmful; and
>>
>> - Defenders will always be one step behind, as the generators continually find new ways to slip past the detection filters.
>>
>> So I guess I'm more in the Cassandra camp than the Pollyanna camp.
>>
>> Best wishes,
>> David Booth
>>
> 
> 

Received on Sunday, 19 February 2023 09:57:00 UTC