Re: Permit external JSON-LD files? from Roger Rogerson on 2022-10-01 (public-schemaorg@w3.org from October 2022)

From: Roger Rogerson <tesconda2@hotmail.com>
Date: Sat, 1 Oct 2022 09:29:43 +0000
To: Guy Thorsby <gthorsby@gmail.com>, Joe Duarte <songofapollo@gmail.com>
CC: schema.org Mailing List <public-schemaorg@w3.org>
Message-ID: <DB9P190MB17001AE2E8F319C9C66D4F2DF8599@DB9P190MB1700.EURP190.PROD.OUTLOOK.COM>
I can see a reply from Hugo,
but don't appear to have an email for such?

Hugo,
I think you may have missed my point,
that being, the Schema is not for people.

People, in general, don't need to download the Schema.
In many cases (such as the Cookie Recipe example),
it is not public facing - it is Bot only content.

So there's no worry about additional connections, requests or caching etc.

The only reason many sites utilise such Schema is Google.
The result being, that we are seeing tons of data transfered,
repeatedly, needlessly.

Shift the Bot code to a separate file.
Indicate (header/meta) existence.
Browsers will typically ignore.
Bots can request as required.

(Similar could be done for OpenGraph etc.,
but this may be more complicated, due to things like Twitter Cards etc.)

________________________________
From: Roger Rogerson <tesconda2@hotmail.com>
Sent: 15 September 2022 08:48
To: Guy Thorsby <gthorsby@gmail.com>; Joe Duarte <songofapollo@gmail.com>
Cc: schema.org Mailing List <public-schemaorg@w3.org>
Subject: Re: Permit external JSON-LD files?

Hi folks.

There is a lot of different things that can be done,
the question is - will they?

We've had various issues for years/decade+, and HTTP has had evolutions to compensate,
rather than correct some of the issues.

But my focus here is on Schema,
and how certain entities have pushed it to be utilised,
in an inefficient way - causing a percentage increase of bloat,
solely for their gain.

Simply externalising it with a standardised extension solves the issue,
instantly.
(And I agree, a lot of the opengraph stuff etc. can have the same treatment)

All it requires is a set of file-extensions to be recognised/accepted,
and end-systems to request them.

If the end-systems have concerns about wasted requests (not knowing what's available)
(ironic!),  then a specialised header response can be included, with a list of standard ext.

x-resources: .meta, .og, .other

If each type of content has it's own dedicated extension,
it means people need only request the one(s) they desire,
whilst normal web users don't get any of that bloat.

The hardship is going to be content platforms.
They will be required to alter their system to handle additional requests,
and fetch specific content to emulate additional "page" requests (in this case the URI+.meta or URI+.schema etc.)


But I think it's more than worth doing,
as the sheer volume of non-human traffic is ridiculous.


So how to make it happen?
________________________________
From: Guy Thorsby <gthorsby@gmail.com>
Sent: 31 August 2022 17:03
To: Joe Duarte <songofapollo@gmail.com>
Cc: Roger Rogerson <tesconda2@hotmail.com>; schema.org Mailing List <public-schemaorg@w3.org>
Subject: Re: Permit external JSON-LD files?

This thread is great.

Just want to drop AMP pages in here so it gets it's representation in this conversation. Not directly related but within the crosshairs of the context.

https://developers.google.com/amp




On Wed, Aug 31, 2022, 10:36 Joe Duarte <songofapollo@gmail.com<mailto:songofapollo@gmail.com>> wrote:
It's a good idea, not just for Schema, but for all metadata not used by browsers or similar clients. This would include all the social media metadata like Facebook's OpenGraph, Twitter cards, Google site verification, the search result snippet things, etc.

I mapped out an approach for this with a separate file extension for the non-browser metadata: .meta

Bots would request the .meta file, in most cases in addition to the actual page (in some cases they might only need the .meta file, maybe the social media links where they just need title and description and an image URL). The .meta file URLs would exactly match the page URLs, except for the extension.

As you noted, the status quo is quite wasteful. It's not just the useless metadata – users are forced to download enormous amounts of CSS that is not used by the page, typically 90%+ unused, and in the form of separate files, which makes it even worse. And enormous amounts of JS, again in separate files, most of it unused, and much of the rest unnecessary for the functionality of, say, an article with comments and a few ads. There's never been a good explanation for this – the caching argument was always a mere assertion falsified by digging in and testing. So there's a lot of room for improvement – the web is much slower than it could be and should be given the ridiculous power of modern computers and the fat pipes we now have. It's amazing how slow even multimillion dollar websites are.

The metadata bloat isn't the biggest culprit, but it's worth sorting out along with the other sources of bloat. I sketched out a framework with .web and .meta files, where .web replaced HTML and CSS. It would be equivalent to, but more compact than, a formally specified minification format for HTML and CSS (something we could really use), combined with default tree-shaken CSS (only the CSS used by the page is in the source, which is trivially easy to achieve by simple selector matching), minified 1-2 byte selector, class, ID, etc. names (no more ponderous 30-byte class names – the browser doesn't need or do anything with them), and an efficient link format with standardized URLs (URLs just 2-3 bytes before the hostname, e.g. H: instead of https://, link markup just 3-4 bytes, and URLs never more than 25 bytes after the hostname).

The metadata format could also be much more compact. There's no reason for machine-readable syntax to be human readable and so bloated. We could easily flip between machine and human readable forms, so it's never made sense to go for both in one bloated format. Most tags could be just one or two bytes. Standardized order can eliminate some tags or other bytes. The format could also be optimized for compression by design (to specific compression formats like brotli or Zstandard, though it might be possible to optimize for both at the same time). JSON is bloated with the quoted keys, long key names, and excessive punctuation – simple newline separation solves two of those, and a binary format could have richer separators and markers just by using the forgotten 1-byte control codes in UTF-8 Basic Latin / ASCII, in addition to 1-2 byte key names.

Cheers,

Joe

On Sat, Aug 27, 2022, 11:11 Roger Rogerson <tesconda2@hotmail.com<mailto:tesconda2@hotmail.com>> wrote:
I appreciate that things like MicroData are inlined,
and utilise the HTML Markup to associate data with content.

But JSON-LD Schema is embedded.
In many cases, this additional code serves no "human" purpose,
and is provided for "machines" (typically Google).

A shining example is the following web page (remove spaces after periods):
https://www. delish. com/cooking/g1956/best-cookies/

That page has approximately 35Kb of Schema.
That is loaded for every single human visitor.

In the case of popular pages - this means a large amount of unnecessary
code is transferred (Gigabytes or higher per year).

If the JSON-LD could be externalised into a referred to file,
then this could reduce bandwidth consumption for users,
help speed up some page load times/improve performance
and help towards "going green".


I appreciate that technically,
this isn't about "Schema" directly,
but about how Browsers and Parsers can recognise and handle
and externalised version - but I'm hoping this is the right place
to get it considered and the right people to see it/push it to browser vendors.


Thank you.
Autocrat.
Received on Saturday, 1 October 2022 09:29:59 UTC