Re: Google opening the door to a discussion about AI opt-out from Leonard Rosenthol on 2023-10-30 (public-tdmrep@w3.org from October 2023)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Mon, 30 Oct 2023 02:40:07 +0000
To: Laurent Le Meur <laurent@edrlab.org>, "public-tdmrep@w3.org" <public-tdmrep@w3.org>
Message-ID: <DM8PR02MB8181142489D512C990877640CDA1A@DM8PR02MB8181.namprd02.prod.outlook.com>

I also attending the webinar and am glad to see Google getting into the act and doing so by taking input and wanting to work through a standards process…All good!

The current problem with the robots.txt direction is the same as the current TDM specification – which is that it assumes that the owner of the content also owns/manages/controls the web site on which it is hosted.  That may help the professional publisher who maintains their sites, but it’s not helpful for the average user putting their content up on social media, stock image services, etc.

Leonard

From: Laurent Le Meur <laurent@edrlab.org>
Date: Sunday, October 29, 2023 at 3:06 PM
To: public-tdmrep@w3.org <public-tdmrep@w3.org>
Subject: Google opening the door to a discussion about AI opt-out

EXTERNAL: Use caution when clicking on links or opening attachments.

On Thursday, 26/10, the Google "The AI Web Publisher Control Development Team" has organized a first webinar (not a discussion, a presentation) "about developing machine-readable means to provide web publisher choice and control for emerging AI and research use cases."
I listened to the webinar, and I hope some of you could participate too.
This is the first time an AI Actor opens the door for discussion, and this is a big one.

The team seems open to standardizing a method with a standards body - they are considering working with the IETF.

During the call, they developed the different issues to be solved: alignment of the different existing options for blocking crawlers, transparency of the ownership and purpose of crawlers, the granularity of the access control, with the notion of a taxonomy of crawl purposes (ex. "search engines" "generative AI applications"), and how to incentivize the adoption of shared standards.

In summary, these notions are crossing our current interrogations and it is time to discuss them also in this group.

The Google team seems inclined to use an evolution of robots.txt for that. They seem ready to add lots of semantics to its current basic model. They didn't speak about robots tags, which should be added to the discussion.
Personally, I see no problem moving from our current implementation of this tdmrep.json file to the good old robots.txt IF the semantics of the latter evolve.

The Google team is now releasing a questionnaire. I received a password for accessing it. Please consider joining this effort, from this blog post

[cid:2FF5F638-F57F-464A-A5D8-D0C112A4F2D9]
A principled approach to evolving choice and control for web content<https://blog.google/technology/ai/ai-web-publisher-controls-sign-up/>
blog.google<https://blog.google/technology/ai/ai-web-publisher-controls-sign-up/>

and form
AI Web Publisher Controls Mailing List Sign-Up<https://services.google.com/fb/forms/ai-web-publisher-controls-external/>
services.google.com<https://services.google.com/fb/forms/ai-web-publisher-controls-external/>
[favicon.ico]<https://services.google.com/fb/forms/ai-web-publisher-controls-external/>

Attachments

image/png attachment: social_share_graphic_fallback_-_multicolor.width-1300.png
image/vnd.microsoft.icon attachment: favicon.ico

Received on Monday, 30 October 2023 02:40:17 UTC