Re: deepfake s bot from Paola Di Maio on 2019-12-22 (public-aikr@w3.org from December 2019)

From: Paola Di Maio <paoladimaio10@gmail.com>
Date: Sun, 22 Dec 2019 08:20:19 +0800
To: Max Weiss <max_weiss@college.harvard.edu>, W3C AIKR CG <public-aikr@w3.org>
Message-ID: <CAMXe=Sptosn6cw9xnHfTG7D_N946F_a9=UJ19W4j8Mzxjtfpnw@mail.gmail.com>
Dear Max

Thanks a lot for your reply and for joining the CG, and for sharing the
workings behind the bot
in our email, which I am sharing with the group below-

I am personally interested in understanding as much as possible how neural
networks do their wonder, and in particular, how uncertainties in outputs
/results can be reduced so that they can be used consistently.
I dont know to what extent the outputs of NN can be explained, but nothing
can
stop us from trying to understand them.
I ll study in more detail later

What I perceived from the article is that the bot wrote human like comments
That is the feature extraordinaire I d like to hear more about :-)

That's what I am interested in, a bot that generates logical and correct
sentences
to the point of being passed for human? Tell us more!

If however the 'fake'' sentences were generated by parsing and meshing
existing text written by humans to generate its own text, is a different
feature (not generate text from scratch but generating text from merging
existing test) which is also a fairly extraordinary feat.
Humans learn how to speak by listening to others and reproducing the
language, before we can master our own.

Maybe the paper could explain what the bot does exactly in more detail, but
definitely yes, I d like to learn fromyou how your bot is producing such
good natural language comments either way because that is in itself quite
interesting work
Could you show us what the NN looks like and how you put it together?

We are working mostly asynchronously these days, so if you have anything to
share maybe you can put together a few slides and do a narration of sorts,
or a write up, but I do not rule out having some live calls from time to
time if people want to present some topic.

Look forward to yo contribution!

Best regards

PDM




On Sun, Dec 22, 2019 at 4:29 AM Max Weiss <max_weiss@college.harvard.edu>
wrote:

> Hi Paola,
>
> Your CG seems to be discussing some interesting topics, so I just
> requested join. I think your suspicions around my results are very wise,
> but allow me to explain a little more about how they were possible and why
> they are significant.
>
> The “bot” described throughout the paper primarily describes the script
> used to actually submit the deepfake comments. This really just boiled down
> to a simple for loop that used Selenium and Proxymesh to make make requests
> and drive chrome to automate the submission process. I would be happy to
> share this code with you, but there is not anything particularly
> interesting or novel in its architecture.
>
> The actual code used to finetune GPT-2 with Tensorflow and generate the
> deepfake comments was written by Max Woolf (
> https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce#scrollTo=Fa6p6arifSL0&forceEdit=true&sandboxMode=true).
> This is also a pretty basic process and diverges very little from the work
> others interested in natural language generation have undertaken.
>
> As your experience informs, deepfaking text is not at the level of, for
> example, writing an entire article using GPT-2. But, the key interesting
> finding from my work is that this is a false metric—the methods for text
> generation* are* already at the point where a meaningful attack on
> something like the federal public comment process is possible and easy. In
> other words, perhaps the best articulation of my findings is not "AI
> methods are powerful enough to fledge a meaningful attack on federal
> websites” but rather “federal websites and similar platforms are so
> vulnerable to manipulation that current AI methods are already advanced
> enough to fledge a meaningful attack."
>
> To elaborate, the barrier of deepfake competency is far lower than one
> would think given a specific task. In this instance, I was able to finetune
> GPT-2 using thousands of comments very similar to those I wanted to
> generate, aside from some small edits requiring simple search-and-replace.
> As described in the paper, of the comments generated with OpenAI’s smallest
> model, about half were what I would qualify as both “highly relevant” to
> the comment process and “highly sensible.” Perhaps this is where you raise
> flags about the efficacy of GPT-2, but I think this misses my focus. In a
> few hours, I was able to filter out the lower quality comments to achieve a
> set of 1000 high-quality comments, that ended up representing a majority of
> all the comments submitted during the period. In this way, I showed that it
> was very easy to utilize GPT-2 to quickly generate and filter a high volume
> of very good comments. Given a little more time and money, this process
> could have been automated with paid grammar/syntax software, testing the
> generated comments against the training set, and requiring a list of key
> words for relevance
>
> I hope this explanation helps to contextualize my findings. As you
> suspected, I could not simply throw a bunch of data into GPT-2 and spit out
> 1000 comments that all passed as human, and I hope that is not how my
> results are perceived. What I did show is that current AI methods are
> already in a place where it took only about a week for me to fledge a
> meaningful attack that totally undermined the efficacy of the federal
> public comment process.
>
> Best,
> Max
>
>
>
> On Dec 20, 2019, at 10:07 PM, Paola Di Maio <paola.dimaio@gmail.com>
> wrote:
>
> Dear Max
> cc AI KR W3C
>
> Thanks for this interesting work
>
> https://techscience.org/a/2019121801/?utm_campaign=the_cybersecurity_202&utm_medium=Email&utm_source=Newsletter&wpisrc=nl_cybersecurity202&wpmm=1#Authors
>
>
> I am researching deepfakes, and I am also researching fake bots
> and fake research claim
>
> I d like to look into the bot to see how the capability of developing and
> delivering such
> clever deepfaking BOT can be achieved and possibly replicate your results
> if possible
>
> Ultimately, I d like to first learn more about this bot, which surely
> sounds very clever but in my experience
> sounds too good to be true
>
> I take this opportunity to invite you to join our CG and if you feel like,
> we can set up a live call with you and other group members
> and have a chat about this work,
> https://www.w3.org/community/aikr/
>
>
>
>
> PDM
>
>
>
>
>
Received on Sunday, 22 December 2019 00:21:02 UTC