- From: Robert Rothenberg <robrwo@gmail.com>
- Date: Sat, 23 Nov 2024 12:56:32 +0000
- To: HTTP Working Group <ietf-http-wg@w3.org>
If you look at the HTTP logs for a website that's been around for a while, you'll notice a lot of weird user agent strings that include the text "Facebot Twitterbot" or "facebookexternal" or even "Googlebot" when they are clearly not. Many of these are from iMessage and various social media/chat applications. I've contacted the developers for one of these and was told this was necessary because some major websites do not include OpenGraph metadata unless the user agent string includes text strings for some well-known bots. However, a website that I maintain has been bombarded with a lot of unidentified web robots that we believe are using our content for AI training, and many of these bots will falsely claim to be Googlebot or Bingbot etc. So we've implemented a scheme to verify these bots and block the fakers. A side-effect is that we're blocking a lot of these social media/chat bots. Ideally, web clients shouldn't have to fake their user agent strings just to get metadata. I think a better solution is to have an HTTP header, something like Accept-Metadata: opengraph, json+ld The server should respond with a normal HTML web page, but can optionally include metadata, possibly with a response header to indicate what metadata formats are included. Is there existing work on this?
Received on Saturday, 23 November 2024 12:58:52 UTC