Re: Signaling opt-out from TDM / AI scrapping in EPUB files

Giulia writes:

>  But it is important to clarify this point to avoid any ambiguity.

Precisely my point. I am confident that the goals of this activity can be
met, while still recognizing the legally protected needs of the disabled
communities impacted. The key (as always) is awareness up-front, and to
that end, the WAI-APA is a fantastic resource that could likely assist here.

As editorial feedback to the draft spec, I'd love to see formal
(contextual) definitions of "mining" and "scraping" provided (where the
envisioned uses, as you have suggested, could be better laid out, while
also recognizing the needs for content conversion for PwD (Persons with
Disabilities) as being outside of those definitions... or something like
that.)

JF

On Thu, Aug 3, 2023 at 12:21 PM Giulia Marangoni <giulia.marangoni@aie.it>
wrote:

> Dear John,
>
>
>
> the specifications of the TDM Reservation protocol refer to *the text and
> data mining* definition of the DSM Directive
> https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32019L0790&qid=1691078221827
>
> art. 2 “‘text and data mining’ means any automated analytical technique
> aimed at analysing text and data in digital form in order to generate
> information which includes but is not limited to patterns, trends and
> correlations”
>
>
>
> (recital 8) new technologies enable the automated computational analysis
> of information in digital form, such as text, sounds, images or data,
> generally known as text and data mining. Text and data mining makes the
> processing of large amounts of information with a view to gaining new
> knowledge and discovering new trends possible. Text and data mining
> technologies are prevalent across the digital economy; however, there is
> widespread acknowledgment that text and data mining can, in particular,
> benefit the research community and, in so doing, support innovation […]
>
>
>
> I’m not expert in accessibility, but I would exclude that the conversion
> of EPUB to Braille could fall into this definition. But it is important to
> clarify this point to avoid any ambiguity.
>
>
>
> The TDM protocol for rights reservation has been designed to comply with
> art.4 of the DSM Directive. The use cases for rights reservation we had in
> mind when started working at the protocol were applications like big data
> analysis and web scraping to feed AI algorithms.
>
>
>
> Best,
>
> Giulia
>
>
>
> *Da:* John Foliot <john@foliot.ca>
> *Inviato:* giovedì 3 agosto 2023 17:51
> *A:* Ivan Herman <ivan@w3.org>; W3C WAI Accessible Platform Architectures
> <public-apa@w3.org>
> *Cc:* Laurent Le Meur <laurent@edrlab.org>; W3C Publishing Business Group
> <public-publishingbg@w3.org>; W3C Publishing Community Group <
> public-publishingcg@w3.org>; Giulia Marangoni <giulia.marangoni@aie.it>
> *Oggetto:* Re: Signaling opt-out from TDM / AI scrapping in EPUB files
>
>
>
> CC'ing WAI-APA
>
> As these discussions proceed, it may be worth noting that "content
> protection" and assistive technology tools are often at odds with each
> other, as Laurent previously noted (fair use in the US, CDSM in the EU).
> Persons with disabilities have legal protections related to content
> conversion (ref.: Marrakesh Treaty
> <https://www.wipo.int/marrakesh_treaty/en/>), and any protocol MUST NOT
> interfere with those rights.
>
> For example, when I read the non-normative "*In this example, the
> rightsholder expresses that non-research Actors from any country can mine
> its content if they agree to pay a fee*.", I am left wondering what
> the definition of "mine / mining" here is (likewise "scraping") - as
> essentially that would be (or at least *COULD BE*) the process when
> converting text - like ePub content - to Braille output.
>
> I do not claim to be an expert in this realm, but DO offer a cautionary
> "heads up" here.
>
> FWIW
>
> JF
>
>
>
> On Thu, Aug 3, 2023 at 11:00 AM Ivan Herman <ivan@w3.org> wrote:
>
>
>
>
>
> On 3 Aug 2023, at 16:48, Laurent Le Meur <laurent@edrlab.org> wrote:
>
>
>
> Hi Ivan,
>
>
>
> There is already a breakout session proposed on "The Impact of Generative
> AI on the Web" (https://github.com/w3c/tpac2023-breakouts/issues/6), one
> of the 3 aspects being
>
>
>
> What are the limits of scraping web data to train Generative AI and what
> technical measures should be implemented to ensure privacy, prevent
> copyright infringement, and effectively manage content creator consent?
>
>
>
> Yes, I have realized that after I sent the mail. But it would then be good
> to act proactively and see if you can make a short presentation on TDM at
> that session
>
>
>
>
>
> I'll participate in this session if I'm still in Seville when it happens.
>
>
>
> but the intregration of an TDM opt-out signal in EPUB is more specific and
> should imho be discussed in the Publishing community during the meeting.
>
>
>
> I did not mean to be an exclusive choice. I fully agree that having a
> discussion with the Publ. community would be timely and great to have. I
> was just wondering whether it is possible to go beyond that.
>
>
>
> Note also (and I referred to this in my comment) if that discussion would
> come up with some explicitly plans/recommendations (not "R", just "r") then
> reporting about it in a larger forum would be good.
>
>
>
> Cheers
>
>
>
> Ivan
>
>
>
>
>
>
>
> Best regards
>
> Laurent
>
>
>
> Le 3 août 2023 à 14:03, Ivan Herman <ivan@w3.org> a écrit :
>
>
>
> Laurent,
>
>
>
> I presume the TDM protocol, as well as the proposed mechanism to "embed"
> it into data, is not only for publishing in the traditional sense, but for
> any type of data on the Web. Ie, this topic may be of interest for an even
> larger community. If that is indeed the case, I believe this would be a
> good topic for the Wednesday breakout sessions of TPAC[1] where the
> possible outcomes of the session with the Publishing BG/CG could also be
> presented.
>
>
>
> As you can see on [1], anyone can propose a new session by raising a
> Github Issue[2]. [3] gives a list of the sessions already proposed; it is
> worth looking at the current list, because you may want to avoid clashes
> with other proposals.
>
>
>
> WDYT?
>
>
>
> Ivan
>
>
>
> [1] https://www.w3.org/2023/09/TPAC/schedule.html#wednesday
>
> [2]
> https://github.com/w3c/tpac2023-breakouts/issues/new?assignees=&labels=session&projects=&template=session.yml
>
> [3] https://github.com/w3c/tpac2023-breakouts/issues
>
>
>
>
>
> On 3 Aug 2023, at 13:46, Laurent Le Meur <laurent@edrlab.org> wrote:
>
>
>
> Dear all,
>
>
>
> There is now pressure from publishers to protect "Web" content from
> scrapping by TDM (Text and Data Mining) and AI (Artificial Intelligence)
> actors.
>
>
>
> The W3C TDM Reservation Protocol (TDMRep) has been created for enabling
> publishers' opt-out from TDM scrapping. TDMRep acts at the level of HTTP
> headers, and can therefore signal a reservation of rights on any Web
> resource. But many publishers would like to also signal a TDM opt-out
> inside files, especially inside EPUB files so that publications can be
> protected even if the website from which they are downloaded does not
> contain any opt-out signal.
>
>
>
> At the request of the TDM Rep CG, I'm therefore reaching you to discuss
> the best way to address this need.
>
> The upcoming *TPAC* seems to be the perfect time to discuss the matter.
>
> *Could we program some time during a session to address this request?*
>
>
>
> Note: the TDMRep defines two metadata properties, one named
> "tdm-reservation", a boolean value that indicates if TDM rights are
> reserved for this resource, and another named "tdm-policy", an optional
> link to details on how to get a license for using the resource for TDM or
> AI.
>
> I can prepare a first proposal relative to the inclusion of these
> properties inside an EPUB package.
>
>
>
> You'll find more information about TDMRep on
>
> - https://w3c.github.io/tdm-reservation-protocol/ = introduction to the
> spec, guidelines, notes ...
>
> - https://www.w3.org/2022/tdmrep/ = the specification
>
> - https://www.w3.org/community/tdmrep/ = the CG page, with meeting notes
>
>
>
> Best regards
>
> Laurent LE MEUR
>
> EDRLab
>
> co-chair of the TDM Reservation Protocol CG
>
>
>
>
>
> Début du message réexpédié :
>
>
>
> *De: *W3C Community Development Team <team-community-process@w3.org>
>
> *Objet: Notes, July 26th, 2023 [via TDM Reservation Protocol Community
> Group]*
>
> *Date: *3 août 2023 à 13:19:39 UTC+2
>
> *À: *public-tdmrep@w3.org
>
> *Renvoyé-De: *public-tdmrep@w3.org
>
> *Répondre à: *TDM Reservation Protocol Community Group <
> team-community-process@w3.org>
>
>
>
> Update on meetings and presentations of the TDM protocol
>
> On 5th June a webinar on the protocol and how to implement it was
> organized by FEP and EDRLab; more than 70 publishers attended, and positive
> feedback was received.
>
> On July 11th, in Bruxelles, the TDM protocol was presented by AIE at the
> “Seminar on best practices for opting-out of generative ML training”,
> organized by Open Future. AIE and FEP attended the event, which was an
> occasion to exchange with organizations representing other rightsholders in
> the content industry, the EC Commission, AI experts, and other
> projects/initiatives offering solutions for machine-readable opt-out,
> namely the C2PA coalition and Spawning. The latter integrates different
> opt-out methods in order to provide a service to AI companies that, given a
> URL in input, can check if there is an opt-out associated with the resource
> that AI players intend to use.
>
> Collaboration with Spawning AI
>
> After some exchanges, Spawning AI has already integrated partially the
> opt-out solution developed by the TDM Rep CG in their service, and they are
> open to collaborating further with the CG.
>
> Discussion on possible developments of the protocol
>
> EDRLab presented an overview of the different opt-out initiatives that are
> in touch with our CG. Some of them are media-specific (like the ones by
> IPTC and C2PA) and provide solutions at the content metadata level, other
> like Spawning AI (and the TDM Rep protocol) are applicable to any content
> type, at the URL level. Even though different solutions (content specific
> and not-content-specific) are complementary and can coexist in line with
> the different standards and practices in the content industry, there are
> significant differences in the semantic approach adopted by IPTC and C2PA
> on one hand, and the TDM Rep on the other: in particular, the different
> solutions reflect different views on whether the TDM concept would cover
> all/some AI usages, and whether indexing by search engines could be part of
> the opt-out. Such discrepancies are partly due to the different legal
> frameworks (US vs. EU) where such initiatives were developed.
>
> Considering the rapid evolution of AI applications, and the ongoing
> discussion in the creative industries on rights reservation and licensing
> for AI, the CG agreed to continue to monitor the situation and exchange
> with the other initiatives in this field before taking any decision on the
> possible refinement of the protocol with new properties or values.
>
> In the short term, it was agreed that:
>
> the CG will check if the semantics of the protocol can be further
> clarified at the level of the specifications, to prevent any ambiguity and
> facilitate interoperability among different solutions.
>
> the CG will work at a FAQ for non-techies that will further clarify the
> meaning of the TDM opt-out in light of the EU legal framework and will
> provide practical insight to the adopters on how to implement it in the
> context of AI.
>
> Implementation in EPUB files
>
> Given the increasing interest by the publishing sector – including, among
> GC members, Mondadori, Penguin Random House, and the STM association - for
> the integration of the TDM protocol in EPUB files, it was agreed that the
> CG will liaise with the W3C Publishing Community Group and the Publishing
> Business Group, which follow EPUB related developments, via EDRLab (who is
> member of both groups).
>
> Particularly, it was agreed that:
>
> On behalf of the CG, EDRLab will send to the W3C Publishing Business Group
> a proposal to be discussed during their next meeting in September;
>
> Should CG members have views or suggestions on the integration of TDM Rep
> in EPUB, they are requested to share them within the CG mailing list at
> their earliest convenience, so that they can be taken into account in the
> framework of the collaboration with the W3C Publishing Business and
> Community Groups
>
> Other activities
>
>  A FAQ for non-tech users: the group agreed to work on a FAQ; for more
> details see above;
>
>
>
>  Keeping track of early adopters: group members are invited to share on
> the CG mailing list information about new adopters of the protocol. The
> list of the early adopted will be publicized on the website of the CG, in
> order to give visibility to it. Early adopters are also encouraged to
> publicize the adoption of the protocol on their own websites.
>
>
>
>
>
>
> ----
> Ivan Herman, W3C
> Home: http://www.w3.org/People/Ivan/
> mobile: +33 6 52 46 00 43
>
>
>
>
>
>
>
>
> ----
> Ivan Herman, W3C
> Home: http://www.w3.org/People/Ivan/
> mobile: +33 6 52 46 00 43
>
>
>
>
>
>
> --
>
> *John Foliot* |
> Senior Industry Specialist, Digital Accessibility |
> W3C Accessibility Standards Contributor |
>
> "I made this so long because I did not have time to make it shorter." -
> Pascal "links go places, buttons do things"
> ------------------------------
>
> *Network Confidentiality Notice*
>
> Il presente messaggio, e ogni eventuale documento a questo allegato,
> potrebbe contenere informazioni da considerarsi strettamente riservate ad
> esclusivo utilizzo del destinatario in indirizzo. Chiunque ricevesse questo
> messaggio per errore o comunque lo leggesse senza esserne legittimato è
> avvertito che trattenerlo, copiarlo, divulgarlo, distribuirlo a persone
> diverse dal destinatario è severamente proibito ed è pregato di darne
> notizia immediatamente al mittente oltre che cancellare il messaggio e i
> suoi eventuali allegati dal proprio sistema.
> Ai sensi del Regolamento UE 2016/679, il Titolare del trattamento
> garantisce la massima riservatezza ed il pieno rispetto degli obblighi
> previsti dalla normativa nazionale e comunitaria in merito alla protezione
> dei dati personali.
>
> This message, and any attached file transmitted with it, contains
> information that may be confidential or privileged for the sole use of the
> intended recipient. If you are not the intended recipient of this e-mail or
> read it without entitlement be advised that keeping, copying, disseminating
> or distributing this message to persons other than the intended recipient
> is strictly forbidden. You are to notify immediately to the sender and to
> delete this message and any file attached from your system.
> In accordance with EU Reg. 2016/679 (GDPR), the Data Controller guarantees
> the maximum level of confidentiality and full respect of all obligations
> provided for by the national and the EU legislation currently in force with
> regard to protection of personal data..
>


-- 
*John Foliot* |
Senior Industry Specialist, Digital Accessibility |
W3C Accessibility Standards Contributor |

"I made this so long because I did not have time to make it shorter." -
Pascal "links go places, buttons do things"

Received on Thursday, 3 August 2023 16:49:25 UTC