- From: Soni \ <fakedme+ietf@gmail.com>
- Date: Sat, 23 Nov 2024 10:45:06 -0300
- To: ietf-http-wg@w3.org
we have asked about this before. don't think it was on this list tho? give us a sec... On 2024-11-23 09:56, Robert Rothenberg wrote: > If you look at the HTTP logs for a website that's been around for a > while, you'll notice a lot of weird user agent strings that include > the text "Facebot Twitterbot" or "facebookexternal" or even > "Googlebot" when they are clearly not. Many of these are from iMessage > and various social media/chat applications. > > I've contacted the developers for one of these and was told this was > necessary because some major websites do not include OpenGraph > metadata unless the user agent string includes text strings for some > well-known bots. > > However, a website that I maintain has been bombarded with a lot of > unidentified web robots that we believe are using our content for AI > training, and many of these bots will falsely claim to be Googlebot or > Bingbot etc. So we've implemented a scheme to verify these bots and > block the fakers. A side-effect is that we're blocking a lot of these > social media/chat bots. > > Ideally, web clients shouldn't have to fake their user agent strings > just to get metadata. > > I think a better solution is to have an HTTP header, something like > > Accept-Metadata: opengraph, json+ld > > The server should respond with a normal HTML web page, but can > optionally include metadata, possibly with a response header to > indicate what metadata formats are included. > > Is there existing work on this? > > > > -- plural system (tend to say 'we'), it/she/they, it instead of you
Received on Saturday, 23 November 2024 13:45:25 UTC