Re: An HTTP header to request OpenGraph or schema.org metadata

apparently it was on this list, here's the thread: 
https://lists.w3.org/Archives/Public/ietf-http-wg/2024JanMar/0181.html

On 2024-11-23 10:45, Soni "It/Its" L. wrote:
> we have asked about this before. don't think it was on this list tho? 
> give us a sec...
>
> On 2024-11-23 09:56, Robert Rothenberg wrote:
>> If you look at the HTTP logs for a website that's been around for a 
>> while, you'll notice a lot of weird user agent strings that include 
>> the text "Facebot Twitterbot" or "facebookexternal" or even 
>> "Googlebot" when they are clearly not. Many of these are from 
>> iMessage and various social media/chat applications.
>>
>> I've contacted the developers for one of these and was told this was 
>> necessary because some major websites do not include OpenGraph 
>> metadata unless the user agent string includes text strings for some 
>> well-known bots.
>>
>> However, a website that I maintain has been bombarded with a lot of 
>> unidentified web robots that we believe are using our content for AI 
>> training, and many of these bots will falsely claim to be Googlebot 
>> or Bingbot etc.  So we've implemented a scheme to verify these bots 
>> and block the fakers.  A side-effect is that we're blocking a lot of 
>> these social media/chat bots.
>>
>> Ideally, web clients shouldn't have to fake their user agent strings 
>> just to get metadata.
>>
>> I think a better solution is to have an HTTP header, something like
>>
>>   Accept-Metadata: opengraph, json+ld
>>
>> The server should respond with a normal HTML web page, but can 
>> optionally include metadata, possibly with a response header to 
>> indicate what metadata formats are included.
>>
>> Is there existing work on this?
>>
>>
>>
>>
>

-- 
plural system (tend to say 'we'), it/she/they, it instead of you

Received on Saturday, 23 November 2024 13:47:41 UTC