An HTTP header to request OpenGraph or schema.org metadata from Robert Rothenberg on 2024-11-23 (ietf-http-wg@w3.org from October to December 2024)

From: Robert Rothenberg <robrwo@gmail.com>
Date: Sat, 23 Nov 2024 12:56:32 +0000
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <4d11088d-36e0-418e-acd4-a5d76497b66d@gmail.com>

If you look at the HTTP logs for a website that's been around for a 
while, you'll notice a lot of weird user agent strings that include the 
text "Facebot Twitterbot" or "facebookexternal" or even "Googlebot" when 
they are clearly not. Many of these are from iMessage and various social 
media/chat applications.

I've contacted the developers for one of these and was told this was 
necessary because some major websites do not include OpenGraph metadata 
unless the user agent string includes text strings for some well-known bots.

However, a website that I maintain has been bombarded with a lot of 
unidentified web robots that we believe are using our content for AI 
training, and many of these bots will falsely claim to be Googlebot or 
Bingbot etc.  So we've implemented a scheme to verify these bots and 
block the fakers.  A side-effect is that we're blocking a lot of these 
social media/chat bots.

Ideally, web clients shouldn't have to fake their user agent strings 
just to get metadata.

I think a better solution is to have an HTTP header, something like

   Accept-Metadata: opengraph, json+ld

The server should respond with a normal HTML web page, but can 
optionally include metadata, possibly with a response header to indicate 
what metadata formats are included.

Is there existing work on this?

Received on Saturday, 23 November 2024 12:58:52 UTC