Re: An HTTP header to request OpenGraph or schema.org metadata from Robert Rothenberg on 2024-12-04 (ietf-http-wg@w3.org from October to December 2024)

From: Robert Rothenberg <robrwo@gmail.com>
Date: Wed, 4 Dec 2024 11:00:03 +0000
To: Austin Wright <aaa@bzfx.net>
Cc: ietf-http-wg@w3.org
Message-ID: <46734ae7-3d3d-412e-8c58-ce0b46040cc0@gmail.com>

On 02/12/2024 17:40, Austin Wright wrote:
> This is already possible to some degree. You can read the Accept header when choosing between media types to embed within a document. Consider the request header:
>
> Accept: text/html, application/json+ld
>
> You may read this as a request to embed JSON-LD within HTML (as opposed to, application/turtle, or nothing at all).
How are embedded OpenGraph, Microdata or RDFa handled?
> Now, it might be possible that this is too ambiguous for some user agents. For example, if a user agent wants a plain Turtle document the most, followed by opaque HTML, and last HTML with embedded Turtle, this may be difficult to convey. I think we should first establish concrete cases where the Accept header would be insufficient by itself, before considering a header like Accept-Metadata.
I think it makes more sense to have a separate Accept-Metadata header 
than to add types to the Accept header:
- The client is explicitly requesting metadata be embedded in the 
document or response headers , not a separate document
- The types for different kinds of metadata  need not be MIME content 
types, just recognised names like "opengraph" or even include the schema 
as well as type, e.g. "json+ld/schema.org"
> Second, I agree that you shouldn’t need to fake user agent strings (at most, it should be a last resort to work around bugs in particular user-agents). However, I’m not sure how this would solve your problem, as they probably have little incentive to read an Accept header to begin with. Or they may be doing this out of some business desire to expose this metadata “only” to Facebook, and not just any client.
I suspect the organisations that care about user agent strings are doing 
this to save bandwidth rather than hide information. (Removing OpenGraph 
data from a page reduces the size by about 1Kib, and for a very 
high-traffic website that may be getting millions of hits per minute, 
that's significant.)

If they only wanted to send the metadata to Google or Facebook, they 
would verify the IP addresses is owned by those entities, making it 
useless to fake the UA string.If a new request header is proposed and 
the major web robots (Google, Facebook etc) start using it, then the 
organisations that care about making metadata optional will start using 
it. It's probably much easier to check for a header than to look for 
various strings in the User-Agent header.
>> It might make sense to make a HEAD or OPTIONS request with an Accept-Metadata header, and the response includes a header with the URL how to retrieve it. Either the same URL (for a GET request, client can determine whether to request first X bytes or entire file depending on metadata type), or a different URL (e.g. for JSON-LD data only or some other format).
> In addition to Content-Type negotiation, there’s also the option of sending a Link header with rel=alternate and type attributes.
That would be useful for standalone metadata formats like JSON-LD. For 
OpenGraph one can simple make a range request that only requests the 
first several KiB of the page.

Received on Wednesday, 4 December 2024 11:00:56 UTC