Re: Permit external JSON-LD files? from Hugo Scott on 2022-09-15 (public-schemaorg@w3.org from September 2022)

From: Hugo Scott <hugo@hugoscott.com>
Date: Thu, 15 Sep 2022 11:30:59 +0200
To: "schema.org Mailing List" <public-schemaorg@w3.org>
Message-ID: <CAOJ1cMR3E7POEF-URL5H0Ev=_JyhXUGHS9TxQw=-HUWqp9RvDg@mail.gmail.com>
I've been reading through this interesting thread and, while I think
understand what you are saying (especially concerning the bloat of the css
and JSON formats), there are a couple of issues that come to mind:

If you put the schema markup and social meta data into an external file to
avoid re-transferring the same data on every page load, then you are
generating another HTTP request which can slightly slow the overall page
load time if it's not deferred properly, and I wonder what the relative
bandwidth requirements are for transferring 35k of text data compared to
making a whole new HTTP request.

Obviously this would not be an issue if the external file was cached by the
browser, but that would then rule out having the specific per-page schema &
social metas markup required for individual product pages, service pages,
recipe pages, blog articles, training courses, events etc

Or maybe I have misunderstood what you are suggesting?

cheers

Hugo Scott



On Thu, 15 Sept 2022 at 10:50, Roger Rogerson <tesconda2@hotmail.com> wrote:

> Hi folks.
>
> There is a lot of different things that can be done,
> the question is - will they?
>
> We've had various issues for years/decade+, and HTTP has had evolutions to
> compensate,
> rather than correct some of the issues.
>
> But my focus here is on Schema,
> and how certain entities have pushed it to be utilised,
> in an inefficient way - causing a percentage increase of bloat,
> solely for their gain.
>
> Simply externalising it with a standardised extension solves the issue,
> instantly.
> (And I agree, a lot of the opengraph stuff etc. can have the same
> treatment)
>
> All it requires is a set of file-extensions to be recognised/accepted,
> and end-systems to request them.
>
> If the end-systems have concerns about wasted requests (not knowing what's
> available)
> (ironic!),  then a specialised header response can be included, with a
> list of standard ext.
>
> x-resources: .meta, .og, .other
>
> If each type of content has it's own dedicated extension,
> it means people need only request the one(s) they desire,
> whilst normal web users don't get any of that bloat.
>
> The hardship is going to be content platforms.
> They will be required to alter their system to handle additional requests,
> and fetch specific content to emulate additional "page" requests (in this
> case the URI+.meta or URI+.schema etc.)
>
>
> But I think it's more than worth doing,
> as the sheer volume of non-human traffic is ridiculous.
>
>
> So how to make it happen?
> ------------------------------
> *From:* Guy Thorsby <gthorsby@gmail.com>
> *Sent:* 31 August 2022 17:03
> *To:* Joe Duarte <songofapollo@gmail.com>
> *Cc:* Roger Rogerson <tesconda2@hotmail.com>; schema.org Mailing List <
> public-schemaorg@w3.org>
> *Subject:* Re: Permit external JSON-LD files?
>
> This thread is great.
>
> Just want to drop AMP pages in here so it gets it's representation in this
> conversation. Not directly related but within the crosshairs of the context.
>
> https://developers.google.com/amp
>
>
>
>
> On Wed, Aug 31, 2022, 10:36 Joe Duarte <songofapollo@gmail.com> wrote:
>
> It's a good idea, not just for Schema, but for *all* metadata not used by
> browsers or similar clients. This would include all the social media
> metadata like Facebook's OpenGraph, Twitter cards, Google site
> verification, the search result snippet things, etc.
>
> I mapped out an approach for this with a separate file extension for the
> non-browser metadata: .meta
>
> Bots would request the .meta file, in most cases in addition to the actual
> page (in some cases they might only need the .meta file, maybe the social
> media links where they just need title and description and an image URL).
> The .meta file URLs would exactly match the page URLs, except for the
> extension.
>
> As you noted, the status quo is quite wasteful. It's not just the useless
> metadata – users are forced to download enormous amounts of CSS that is not
> used by the page, typically 90%+ unused, and in the form of separate files,
> which makes it even worse. And enormous amounts of JS, again in separate
> files, most of it unused, and much of the rest unnecessary for the
> functionality of, say, an article with comments and a few ads. There's
> never been a good explanation for this – the caching argument was always a
> mere assertion falsified by digging in and testing. So there's a lot of
> room for improvement – the web is much slower than it could be and should
> be given the ridiculous power of modern computers and the fat pipes we now
> have. It's amazing how slow even multimillion dollar websites are.
>
> The metadata bloat isn't the biggest culprit, but it's worth sorting out
> along with the other sources of bloat. I sketched out a framework with .web
> and .meta files, where .web replaced HTML and CSS. It would be equivalent
> to, but more compact than, a formally specified minification format for
> HTML and CSS (something we could really use), combined with default
> tree-shaken CSS (only the CSS used by the page is in the source, which is
> trivially easy to achieve by simple selector matching), minified 1-2 byte
> selector, class, ID, etc. names (no more ponderous 30-byte class names –
> the browser doesn't need or do anything with them), and an efficient link
> format with standardized URLs (URLs just 2-3 bytes before the hostname,
> e.g. H: instead of https://, link markup just 3-4 bytes, and URLs never
> more than 25 bytes after the hostname).
>
> The metadata format could also be much more compact. There's no reason for
> machine-readable syntax to be human readable and so bloated. We could
> easily flip between machine and human readable forms, so it's never made
> sense to go for both in one bloated format. Most tags could be just one or
> two bytes. Standardized order can eliminate some tags or other bytes. The
> format could also be optimized for compression by design (to specific
> compression formats like brotli or Zstandard, though it might be possible
> to optimize for both at the same time). JSON is bloated with the quoted
> keys, long key names, and excessive punctuation – simple newline separation
> solves two of those, and a binary format could have richer separators and
> markers just by using the forgotten 1-byte control codes in UTF-8 Basic
> Latin / ASCII, in addition to 1-2 byte key names.
>
> Cheers,
>
> Joe
>
> On Sat, Aug 27, 2022, 11:11 Roger Rogerson <tesconda2@hotmail.com> wrote:
>
> I appreciate that things like MicroData are inlined,
> and utilise the HTML Markup to associate data with content.
>
> But JSON-LD Schema is embedded.
> In many cases, this additional code serves no "human" purpose,
> and is provided for "machines" (typically Google).
>
> A shining example is the following web page (remove spaces after periods):
> https://www. delish. com/cooking/g1956/best-cookies/
>
> That page has approximately 35Kb of Schema.
> That is loaded for every single human visitor.
>
> In the case of popular pages - this means a large amount of unnecessary
> code is transferred (Gigabytes or higher per year).
>
> If the JSON-LD could be externalised into a referred to file,
> then this could reduce bandwidth consumption for users,
> help speed up some page load times/improve performance
> and help towards "going green".
>
>
> I appreciate that technically,
> this isn't about "Schema" directly,
> but about how Browsers and Parsers can recognise and handle
> and externalised version - but I'm hoping this is the right place
> to get it considered and the right people to see it/push it to browser
> vendors.
>
>
> Thank you.
> Autocrat.
>
>
Received on Thursday, 15 September 2022 09:31:29 UTC