Re: AI catfishing [was Re: ChatGPT and ontologies]

+1 Dan,
Looks like it may be heading that direction - I just hope we get some clean
signing paths with JOSE/COSE that work well for both the API side and the
browser side without any jumping through major  hoops.

Mike Prorock
CTO, Founder
https://mesur.io/



On Sun, Feb 19, 2023 at 4:43 AM Dan Brickley <danbri@danbri.org> wrote:

> On Sun, 19 Feb 2023 at 11:33, Hugh Glaser <hugh@glasers.org> wrote:
>
>> Excellent.
>> Many thanks, Dan - just the sort of pointer I was hoping for.
>>
>> And yes, fair enough to record a reminder that this tech can be hugely
>> beneficial (once it is on the right bit of the Gartner curve, perhaps).
>> "terrible folly” may be overstating a bit :-) , I would have thought.
>
>
> Yes - Bergi’s point is well made and along the lines I am thinking of wrt
> Schema.org. It may be easier and more constructive to keep track of where
> things have come from (socially, not tools), than where they didn’t come
> from.
>
> Maybe digitally signed RDF will finally get its day?
>
> Dan
>
>
>
>> A variety of tools to various things can usually be useful.
>>
>> Best
>> Hugh
>>
>> > On 18 Feb 2023, at 16:16, Dan Brickley <danbri@danbri.org> wrote:
>> >
>> > It has been tried, implemented, debated, critiqued etc! Even openai had
>> a tool.
>> >
>> >
>> https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text/
>> >
>> >
>> > It is a tougher problem with text generation than with images since
>> latter has much more scope for steganography and embedded metadata. Plus
>> you are unlikely to know whether you have made an “AI” detector or a GPT-x
>> + training set + finetuning regime + prompted context. And since anything
>> from a ouija board to a trillion dollar corporation can be seen as an AI,
>> the goal here isn’t particularly clear.
>> >
>> > In my personal view this route is a terrible folly to follow.
>> >
>> > Tooling using LLMs or better will be used as an aid for people with
>> challenges like having to work and apply for jobs in a 2nd, 3rd or 4th
>> language, or cognitive issues, dyslexia, or advanced summarisation for
>> blind users who are sick of slogging through giant verbose documents in
>> search of a simple claim or two. It would help nobody to stigmatize such
>> use.
>> >
>> > LLMs can also help with writing in ways that do not directly create
>> text. Eg I had one write me a project plan for turning my silly movie
>> script idea into a movie. The use of LLMs encourages task decomposition in
>> a very “rubber duck programming” sort of a way.
>> >
>> > Detecting text or ideas that passed through an LLM at some point (eg
>> next year) will be as hopeless as detecting text that has passed at some
>> point through bluetooth, or speech to text, or spell checking, or cleartext
>> http:.
>> >
>> > Dan
>> >
>> > On Sat, 18 Feb 2023 at 12:23, Hugh Glaser <hugh@glasers.org> wrote:
>> > Thanks David,
>> >
>> > Then I guess the answer to my question is “No, no-one here knows anyone
>> who has tried using LLMs such as GPT-3 to find out if text is human- or
>> machine-generated”
>> >
>> > FWIW, I was thinking about at least a couple of ways of doing it.
>> > Firstly, systems could be directly trained; I think many people have
>> been surprised at how functional the LLMs have been - maybe people would be
>> surprised at how functional such detectors could be; I think this is like
>> carrying forward spam detection-like processes.
>> > Secondly, the normal GPT-3s etc., could be used as-is, fed prompts of
>> material, and asked if the author is human. This is the sort of thing I was
>> thinking of; and improvements in generating tech would then be naturally
>> tracked by the consequent improvement in detection.
>> >
>> > Interestingly, I see the launch of these LLMs as something of a
>> singularity.
>> > In a few years (months?) it will be interesting to try and find large
>> training sets where you are confident that you know whether the authors are
>> human or LLM.
>> > I can see an ouroboros of LLMs being trained on each other, and even
>> themselves.
>> > Training sets that predate this, or are guaranteed one or the other,
>> while being sufficiently large, will be valuable.
>> >
>> > BTW.
>> > Spam detection is not the preserve of big, well-funded business - some
>> of the best stuff is very much not from those sources.
>> >
>> > Cheers
>> > Hugh
>> >
>> > > On 17 Feb 2023, at 22:44, David Booth <david@dbooth.org> wrote:
>> > >
>> > > On 2/17/23 11:34, Hugh Glaser wrote:
>> > > > I disagree, David.
>> > > > The Spam-fighting arms race is an example of huge success on the
>> > > > part of the defenders.
>> > >
>> > > Very good point.  I guess I didn't adequately qualify my spam
>> comparison.  Spam fighting has had a lot of success, however:
>> > >
>> > > - Spam is generally trying to get you to click on an easily
>> identifiable link, or selling a very specific product.  That's inherently
>> MUCH easier to detect than deciding whether a message was written by a
>> human vs a bot (as Patrick Logan also pointed out).
>> > >
>> > > - Spam-fighting is MUCH better funded than your random spammer.
>> Think Google.  AI-generated influence messages -- including harmful
>> disinformation -- will come from well funded organizations/adversaries.
>> > >
>> > > - When one spam message gets through the spam filters, it generally
>> causes very little harm -- a minor annoyance.  But if one AI-generated
>> spear phishing campaign succeeds, or if an AI-generated propaganda campaign
>> succeeds, the consequences can be grave.
>> > >
>> > > So although spam fighting has had success, I don't see that success
>> carrying over to distinguishing AI-generated content from human-generated
>> content.  I think the continuing failure, of big social media companies
>> (think Facebook and Twitter), to automatically distinguish human posts from
>> bot posts, is already evidence of how hard it is to detect.  As AI improves
>> I only expect the problem to get worse, because a well-funded adversary has
>> two inherent advantages:
>> > >
>> > > - When it is so cheap to generate fake content, even if only a small
>> fraction gets past the fake-detection filters, that can still be a large
>> quantity, and still harmful; and
>> > >
>> > > - Defenders will always be one step behind, as the generators
>> continually find new ways to slip past the detection filters.
>> > >
>> > > So I guess I'm more in the Cassandra camp than the Pollyanna camp.
>> > >
>> > > Best wishes,
>> > > David Booth
>> > >
>> >
>> >
>>
>>

Received on Sunday, 19 February 2023 14:08:24 UTC