Re: Three interesting articles

You missed this big one - EU lawmakers pass draft of AI Act, includes copyright rules for generative AI - https://venturebeat.com/ai/eu-lawmakers-pass-draft-of-ai-act-includes-last-minute-change-on-generative-ai-models/amp/

Yesterday, the National Law Review<https://www.natlawreview.com/article/eu-s-proposed-artificial-intelligence-act> wrote, “The AI Act will have a global impact, as it will apply to organizations providing or using AI systems in the EU; and providers or users of AI systems located in a third country (including the UK and US), if the output produced by those AI systems is used in the EU.”



From: Laurent Le Meur <laurent@edrlab.org>
Date: Saturday, April 29, 2023 at 2:47 PM
To: public-tdmrep@w3.org <public-tdmrep@w3.org>
Subject: Three interesting articles

EXTERNAL: Use caution when clicking on links or opening attachments.


As usual, comments welcome.

A Photographer Tried to Get His Photos Removed from an AI Dataset. He Got an Invoice Instead.
https://www.vice.com/en/article/pkapb7/a-photographer-tried-to-get-his-photos-removed-from-an-ai-dataset-he-got-an-invoice-instead

--> an AI tools may or may not store scrapped content after training; therefore requesting the "removal" of some content from a dataset may not be possible. Better to avoid getting the content scrapped at first!
Note: It seems that this German photographers had his pictures sold on Shutterstock (a US company, certainly using US servers) and therefore his pictures are on client websites. Our protocol based solution cannot be efficient for such use case.

A Primer and FAQ on Copyright Law and Generative AI for News Media
https://generative-ai-newsroom.com/a-primer-and-faq-on-copyright-law-and-generative-ai-for-news-media-f1349f514883

Extract:

"From the input perspective, the main issue relates to the activities needed to build an AI system. In particular, the training stage of the AI tools we are considering here requires text and data mining (TDM) of copyrighted works. In the EU, these activities are mostly regulated by two TDM exceptions in the 2019 Copyright in the Digital Single Market Directive, which cover TDM for scientific purposes (Article 3) and what is called commercial TDM (Article 4). For models like Midjourney, Stable Diffusion, Dalle-E, or Firefly, the relevant provision would be the commercial TDM exception."

--> another converging take on TDM being the first step of AI training.

"In the US, absent a specific TDM exception, the legal question is whether these activities qualify as fair use. In the aftermath of cases like Authors Guild v. HathiTrust and Authors Guild v. Google, it has been argued that the US doctrine of fair use allows for a significant range of TDM activities of in-copyright works. The result is that US copyright law is arguably one of the most permissive for TDM activities in the world, especially when compared to laws that rely on stricter exceptions and limitations, like the EU. This arguably makes the US an appealing jurisdiction for companies to develop generative AI tools."

--> which leaves open the issue of the"country where the TDM/AI usages takes place". An AI solution may be US, we think it won't be able to crawl EU servers without taking EU laws in account. This is something we'll soon discuss.

Examples of Text and Data Mining Research Using Copyrighted Materials
https://copyrightblog.kluweriplaw.com/2023/03/06/examples-of-text-and-data-mining-research-using-copyrighted-materials/

--> which shows that if AI tools rely on TDM for their training phase, TDM is much wider than AI alone.
This could make us refine our TDM Policy vocabulary, so that a publisher can express that he agrees with mining in general, but not for AI training in particular.
An elegant solution to a possible prevention against having its content used to train generative AI, without blocking automatic categorisation, recommendation services ...

Best regards
Laurent

Received on Sunday, 30 April 2023 18:44:12 UTC