Re: An HTTP header to request OpenGraph or schema.org metadata

we have asked about this before. don't think it was on this list tho? 
give us a sec...

On 2024-11-23 09:56, Robert Rothenberg wrote:
> If you look at the HTTP logs for a website that's been around for a 
> while, you'll notice a lot of weird user agent strings that include 
> the text "Facebot Twitterbot" or "facebookexternal" or even 
> "Googlebot" when they are clearly not. Many of these are from iMessage 
> and various social media/chat applications.
>
> I've contacted the developers for one of these and was told this was 
> necessary because some major websites do not include OpenGraph 
> metadata unless the user agent string includes text strings for some 
> well-known bots.
>
> However, a website that I maintain has been bombarded with a lot of 
> unidentified web robots that we believe are using our content for AI 
> training, and many of these bots will falsely claim to be Googlebot or 
> Bingbot etc.  So we've implemented a scheme to verify these bots and 
> block the fakers.  A side-effect is that we're blocking a lot of these 
> social media/chat bots.
>
> Ideally, web clients shouldn't have to fake their user agent strings 
> just to get metadata.
>
> I think a better solution is to have an HTTP header, something like
>
>   Accept-Metadata: opengraph, json+ld
>
> The server should respond with a normal HTML web page, but can 
> optionally include metadata, possibly with a response header to 
> indicate what metadata formats are included.
>
> Is there existing work on this?
>
>
>
>

-- 
plural system (tend to say 'we'), it/she/they, it instead of you

Received on Saturday, 23 November 2024 13:45:25 UTC