- From: Robert Rothenberg <robrwo@gmail.com>
- Date: Wed, 4 Dec 2024 11:00:03 +0000
- To: Austin Wright <aaa@bzfx.net>
- Cc: ietf-http-wg@w3.org
- Message-ID: <46734ae7-3d3d-412e-8c58-ce0b46040cc0@gmail.com>
On 02/12/2024 17:40, Austin Wright wrote: > This is already possible to some degree. You can read the Accept header when choosing between media types to embed within a document. Consider the request header: > > Accept: text/html, application/json+ld > > You may read this as a request to embed JSON-LD within HTML (as opposed to, application/turtle, or nothing at all). How are embedded OpenGraph, Microdata or RDFa handled? > Now, it might be possible that this is too ambiguous for some user agents. For example, if a user agent wants a plain Turtle document the most, followed by opaque HTML, and last HTML with embedded Turtle, this may be difficult to convey. I think we should first establish concrete cases where the Accept header would be insufficient by itself, before considering a header like Accept-Metadata. I think it makes more sense to have a separate Accept-Metadata header than to add types to the Accept header: - The client is explicitly requesting metadata be embedded in the document or response headers , not a separate document - The types for different kinds of metadata need not be MIME content types, just recognised names like "opengraph" or even include the schema as well as type, e.g. "json+ld/schema.org" > Second, I agree that you shouldn’t need to fake user agent strings (at most, it should be a last resort to work around bugs in particular user-agents). However, I’m not sure how this would solve your problem, as they probably have little incentive to read an Accept header to begin with. Or they may be doing this out of some business desire to expose this metadata “only” to Facebook, and not just any client. I suspect the organisations that care about user agent strings are doing this to save bandwidth rather than hide information. (Removing OpenGraph data from a page reduces the size by about 1Kib, and for a very high-traffic website that may be getting millions of hits per minute, that's significant.) If they only wanted to send the metadata to Google or Facebook, they would verify the IP addresses is owned by those entities, making it useless to fake the UA string.If a new request header is proposed and the major web robots (Google, Facebook etc) start using it, then the organisations that care about making metadata optional will start using it. It's probably much easier to check for a header than to look for various strings in the User-Agent header. >> It might make sense to make a HEAD or OPTIONS request with an Accept-Metadata header, and the response includes a header with the URL how to retrieve it. Either the same URL (for a GET request, client can determine whether to request first X bytes or entire file depending on metadata type), or a different URL (e.g. for JSON-LD data only or some other format). > In addition to Content-Type negotiation, there’s also the option of sending a Link header with rel=alternate and type attributes. That would be useful for standalone metadata formats like JSON-LD. For OpenGraph one can simple make a range request that only requests the first several KiB of the page.
Received on Wednesday, 4 December 2024 11:00:56 UTC