- From: Mike Prorock <mprorock@mesur.io>
- Date: Sun, 19 Feb 2023 07:08:00 -0700
- To: Dan Brickley <danbri@danbri.org>
- Cc: Hugh Glaser <hugh@glasers.org>, David Booth <david@dbooth.org>, SW-forum <semantic-web@w3.org>
- Message-ID: <CAGJKSNTzfRGMH_2brAmXQKguAuWxP689dMc61HBTxD9atocbuw@mail.gmail.com>
+1 Dan, Looks like it may be heading that direction - I just hope we get some clean signing paths with JOSE/COSE that work well for both the API side and the browser side without any jumping through major hoops. Mike Prorock CTO, Founder https://mesur.io/ On Sun, Feb 19, 2023 at 4:43 AM Dan Brickley <danbri@danbri.org> wrote: > On Sun, 19 Feb 2023 at 11:33, Hugh Glaser <hugh@glasers.org> wrote: > >> Excellent. >> Many thanks, Dan - just the sort of pointer I was hoping for. >> >> And yes, fair enough to record a reminder that this tech can be hugely >> beneficial (once it is on the right bit of the Gartner curve, perhaps). >> "terrible folly” may be overstating a bit :-) , I would have thought. > > > Yes - Bergi’s point is well made and along the lines I am thinking of wrt > Schema.org. It may be easier and more constructive to keep track of where > things have come from (socially, not tools), than where they didn’t come > from. > > Maybe digitally signed RDF will finally get its day? > > Dan > > > >> A variety of tools to various things can usually be useful. >> >> Best >> Hugh >> >> > On 18 Feb 2023, at 16:16, Dan Brickley <danbri@danbri.org> wrote: >> > >> > It has been tried, implemented, debated, critiqued etc! Even openai had >> a tool. >> > >> > >> https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text/ >> > >> > >> > It is a tougher problem with text generation than with images since >> latter has much more scope for steganography and embedded metadata. Plus >> you are unlikely to know whether you have made an “AI” detector or a GPT-x >> + training set + finetuning regime + prompted context. And since anything >> from a ouija board to a trillion dollar corporation can be seen as an AI, >> the goal here isn’t particularly clear. >> > >> > In my personal view this route is a terrible folly to follow. >> > >> > Tooling using LLMs or better will be used as an aid for people with >> challenges like having to work and apply for jobs in a 2nd, 3rd or 4th >> language, or cognitive issues, dyslexia, or advanced summarisation for >> blind users who are sick of slogging through giant verbose documents in >> search of a simple claim or two. It would help nobody to stigmatize such >> use. >> > >> > LLMs can also help with writing in ways that do not directly create >> text. Eg I had one write me a project plan for turning my silly movie >> script idea into a movie. The use of LLMs encourages task decomposition in >> a very “rubber duck programming” sort of a way. >> > >> > Detecting text or ideas that passed through an LLM at some point (eg >> next year) will be as hopeless as detecting text that has passed at some >> point through bluetooth, or speech to text, or spell checking, or cleartext >> http:. >> > >> > Dan >> > >> > On Sat, 18 Feb 2023 at 12:23, Hugh Glaser <hugh@glasers.org> wrote: >> > Thanks David, >> > >> > Then I guess the answer to my question is “No, no-one here knows anyone >> who has tried using LLMs such as GPT-3 to find out if text is human- or >> machine-generated” >> > >> > FWIW, I was thinking about at least a couple of ways of doing it. >> > Firstly, systems could be directly trained; I think many people have >> been surprised at how functional the LLMs have been - maybe people would be >> surprised at how functional such detectors could be; I think this is like >> carrying forward spam detection-like processes. >> > Secondly, the normal GPT-3s etc., could be used as-is, fed prompts of >> material, and asked if the author is human. This is the sort of thing I was >> thinking of; and improvements in generating tech would then be naturally >> tracked by the consequent improvement in detection. >> > >> > Interestingly, I see the launch of these LLMs as something of a >> singularity. >> > In a few years (months?) it will be interesting to try and find large >> training sets where you are confident that you know whether the authors are >> human or LLM. >> > I can see an ouroboros of LLMs being trained on each other, and even >> themselves. >> > Training sets that predate this, or are guaranteed one or the other, >> while being sufficiently large, will be valuable. >> > >> > BTW. >> > Spam detection is not the preserve of big, well-funded business - some >> of the best stuff is very much not from those sources. >> > >> > Cheers >> > Hugh >> > >> > > On 17 Feb 2023, at 22:44, David Booth <david@dbooth.org> wrote: >> > > >> > > On 2/17/23 11:34, Hugh Glaser wrote: >> > > > I disagree, David. >> > > > The Spam-fighting arms race is an example of huge success on the >> > > > part of the defenders. >> > > >> > > Very good point. I guess I didn't adequately qualify my spam >> comparison. Spam fighting has had a lot of success, however: >> > > >> > > - Spam is generally trying to get you to click on an easily >> identifiable link, or selling a very specific product. That's inherently >> MUCH easier to detect than deciding whether a message was written by a >> human vs a bot (as Patrick Logan also pointed out). >> > > >> > > - Spam-fighting is MUCH better funded than your random spammer. >> Think Google. AI-generated influence messages -- including harmful >> disinformation -- will come from well funded organizations/adversaries. >> > > >> > > - When one spam message gets through the spam filters, it generally >> causes very little harm -- a minor annoyance. But if one AI-generated >> spear phishing campaign succeeds, or if an AI-generated propaganda campaign >> succeeds, the consequences can be grave. >> > > >> > > So although spam fighting has had success, I don't see that success >> carrying over to distinguishing AI-generated content from human-generated >> content. I think the continuing failure, of big social media companies >> (think Facebook and Twitter), to automatically distinguish human posts from >> bot posts, is already evidence of how hard it is to detect. As AI improves >> I only expect the problem to get worse, because a well-funded adversary has >> two inherent advantages: >> > > >> > > - When it is so cheap to generate fake content, even if only a small >> fraction gets past the fake-detection filters, that can still be a large >> quantity, and still harmful; and >> > > >> > > - Defenders will always be one step behind, as the generators >> continually find new ways to slip past the detection filters. >> > > >> > > So I guess I'm more in the Cassandra camp than the Pollyanna camp. >> > > >> > > Best wishes, >> > > David Booth >> > > >> > >> > >> >>
Received on Sunday, 19 February 2023 14:08:24 UTC