Re: TDMRep : Interesting article on our subject from Leonard Rosenthol on 2023-04-05 (public-tdmrep@w3.org from April 2023)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Wed, 5 Apr 2023 13:04:26 +0000
To: Laurent Le Meur <laurent@edrlab.org>, "public-tdmrep@w3.org" <public-tdmrep@w3.org>
Message-ID: <DM8PR02MB81819604E85E6A5D681F5862CD909@DM8PR02MB8181.namprd02.prod.outlook.com>
I have no idea how I missed the meeting – so sorry!!    It doesn’t seem to have made my calendar ☹.

As I mentioned previously, the C2PA has just published their 1.3 update with definitions for TDM and ML that are separate and distinct from each other - https://c2pa.org/specifications/specifications/1.3/specs/C2PA_Specification.html#_training_and_data_mining.  As you will see there, the TDM definition is taken from the EU directive while the ML ones are based on a clearer understanding of how ML works – both in terms of training and inference.

This is already in use today as part of Adobe’s Firefly generative AI system (https://news.adobe.com/news/news-details/2023/Adobe-Unveils-Firefly-a-Family-of-new-Creative-Generative-AI/default.aspx) in development elsewhere.  It is the “Do not Train” feature mentioned there.

Leonard

From: Laurent Le Meur <laurent@edrlab.org>
Date: Wednesday, April 5, 2023 at 7:14 AM
To: public-tdmrep@w3.org <public-tdmrep@w3.org>
Subject: TDMRep : Interesting article on our subject
EXTERNAL: Use caution when clicking on links or opening attachments.


Dear members of the TDMRep CG,

the notes of our first 2023 call will be published soon. But before that I wanted to send the link to this interesting article published on openfuture.eu.

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenfuture.eu%2Fblog%2Fprotecting-creatives-or-impeding-progress%2F&data=05%7C01%7Clrosenth%40adobe.com%7Cf5e6d9103fe54be694ed08db35c6dd7a%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638162900468148812%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PvYi03JZMfVw7C5qr5HdOeLf7%2FQ6dtMACEDmXetNCw0%3D&reserved=0<https://openfuture.eu/blog/protecting-creatives-or-impeding-progress/>

Its position is clear about "ML = TDM":  "The CDSM Directive defines text and data mining as “any automated analytical technique aimed at analyzing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations.” This definition clearly covers current approaches to machine learning that rely heavily on correlations between observed characteristics of training data. The use of copyrighted works as part of the training data is exactly the type of use that was foreseen when the TDM exception was drafted."

Happy to get your thoughts about this article *

Other notes:

- Creative Commons is also working on the subject of "better sharing for generative AI", has organized a set of webinars on the subject in February (with US people mainly from what I see), and is publishing a set of blog posts on AI -> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcreativecommons.org%2Ftag%2Fai%2F&data=05%7C01%7Clrosenth%40adobe.com%7Cf5e6d9103fe54be694ed08db35c6dd7a%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638162900468148812%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BfoDLn%2FQDb26Pg1WL0DMu%2FoBqYZbH2QSpy4toDJ1oYg%3D&reserved=0<https://creativecommons.org/tag/ai/>.

- "Stability AI plans to let artists opt out of Stable Diffusion 3 image training"
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Farstechnica.com%2Finformation-technology%2F2022%2F12%2Fstability-ai-plans-to-let-artists-opt-out-of-stable-diffusion-3-image-training%2F&data=05%7C01%7Clrosenth%40adobe.com%7Cf5e6d9103fe54be694ed08db35c6dd7a%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638162900468148812%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ocSKmSjRFe5y01Xr3yZd1jfKhbaTIvDYmzMjRVQl0ec%3D&reserved=0<https://arstechnica.com/information-technology/2022/12/stability-ai-plans-to-let-artists-opt-out-of-stable-diffusion-3-image-training/>

- Some companies are building AI opt-out tools, which may lead to a large fragmentation of efforts for authors: e.g. https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspawning.ai%2F&data=05%7C01%7Clrosenth%40adobe.com%7Cf5e6d9103fe54be694ed08db35c6dd7a%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638162900468148812%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NMV2jFEyUlH9Ue6BSy6q36EScbOhLrQeBcjj4pAS4u0%3D&reserved=0<https://spawning.ai/> (thanks to Claudia for the link)


(*) I personally don't understand one of the last paragraphs, which seems to contradict the rest of the article: "For now, this opt-in approach to copyright is limited to TDM, but it is not inconceivable that this approach could be expanded if it proves to work in practice, especially in the ongoing discussion about ML training."


Best regards
Laurent Le Meur
Received on Wednesday, 5 April 2023 13:04:34 UTC