- From: Soni \ <fakedme+ietf@gmail.com>
- Date: Sat, 23 Nov 2024 10:47:33 -0300
- To: ietf-http-wg@w3.org
apparently it was on this list, here's the thread: https://lists.w3.org/Archives/Public/ietf-http-wg/2024JanMar/0181.html On 2024-11-23 10:45, Soni "It/Its" L. wrote: > we have asked about this before. don't think it was on this list tho? > give us a sec... > > On 2024-11-23 09:56, Robert Rothenberg wrote: >> If you look at the HTTP logs for a website that's been around for a >> while, you'll notice a lot of weird user agent strings that include >> the text "Facebot Twitterbot" or "facebookexternal" or even >> "Googlebot" when they are clearly not. Many of these are from >> iMessage and various social media/chat applications. >> >> I've contacted the developers for one of these and was told this was >> necessary because some major websites do not include OpenGraph >> metadata unless the user agent string includes text strings for some >> well-known bots. >> >> However, a website that I maintain has been bombarded with a lot of >> unidentified web robots that we believe are using our content for AI >> training, and many of these bots will falsely claim to be Googlebot >> or Bingbot etc. So we've implemented a scheme to verify these bots >> and block the fakers. A side-effect is that we're blocking a lot of >> these social media/chat bots. >> >> Ideally, web clients shouldn't have to fake their user agent strings >> just to get metadata. >> >> I think a better solution is to have an HTTP header, something like >> >> Accept-Metadata: opengraph, json+ld >> >> The server should respond with a normal HTML web page, but can >> optionally include metadata, possibly with a response header to >> indicate what metadata formats are included. >> >> Is there existing work on this? >> >> >> >> > -- plural system (tend to say 'we'), it/she/they, it instead of you
Received on Saturday, 23 November 2024 13:47:41 UTC