Re: Permit external JSON-LD files? from Guy Thorsby on 2022-08-31 (public-schemaorg@w3.org from August 2022)

From: Guy Thorsby <gthorsby@gmail.com>
Date: Wed, 31 Aug 2022 11:03:45 -0600
To: Joe Duarte <songofapollo@gmail.com>
Cc: Roger Rogerson <tesconda2@hotmail.com>, "schema.org Mailing List" <public-schemaorg@w3.org>
Message-ID: <CANyF0ZMT2OczVd_cmBV9WyfKQTpebdEohDjtM_EEuJFBCJc-1g@mail.gmail.com>
This thread is great.

Just want to drop AMP pages in here so it gets it's representation in this
conversation. Not directly related but within the crosshairs of the context..

https://developers.google.com/amp




On Wed, Aug 31, 2022, 10:36 Joe Duarte <songofapollo@gmail.com> wrote:

> It's a good idea, not just for Schema, but for *all* metadata not used by
> browsers or similar clients. This would include all the social media
> metadata like Facebook's OpenGraph, Twitter cards, Google site
> verification, the search result snippet things, etc.
>
> I mapped out an approach for this with a separate file extension for the
> non-browser metadata: .meta
>
> Bots would request the .meta file, in most cases in addition to the actual
> page (in some cases they might only need the .meta file, maybe the social
> media links where they just need title and description and an image URL).
> The .meta file URLs would exactly match the page URLs, except for the
> extension.
>
> As you noted, the status quo is quite wasteful. It's not just the useless
> metadata – users are forced to download enormous amounts of CSS that is not
> used by the page, typically 90%+ unused, and in the form of separate files,
> which makes it even worse. And enormous amounts of JS, again in separate
> files, most of it unused, and much of the rest unnecessary for the
> functionality of, say, an article with comments and a few ads. There's
> never been a good explanation for this – the caching argument was always a
> mere assertion falsified by digging in and testing. So there's a lot of
> room for improvement – the web is much slower than it could be and should
> be given the ridiculous power of modern computers and the fat pipes we now
> have. It's amazing how slow even multimillion dollar websites are.
>
> The metadata bloat isn't the biggest culprit, but it's worth sorting out
> along with the other sources of bloat. I sketched out a framework with .web
> and .meta files, where .web replaced HTML and CSS. It would be equivalent
> to, but more compact than, a formally specified minification format for
> HTML and CSS (something we could really use), combined with default
> tree-shaken CSS (only the CSS used by the page is in the source, which is
> trivially easy to achieve by simple selector matching), minified 1-2 byte
> selector, class, ID, etc. names (no more ponderous 30-byte class names –
> the browser doesn't need or do anything with them), and an efficient link
> format with standardized URLs (URLs just 2-3 bytes before the hostname,
> e.g. H: instead of https://, link markup just 3-4 bytes, and URLs never
> more than 25 bytes after the hostname).
>
> The metadata format could also be much more compact. There's no reason for
> machine-readable syntax to be human readable and so bloated. We could
> easily flip between machine and human readable forms, so it's never made
> sense to go for both in one bloated format. Most tags could be just one or
> two bytes. Standardized order can eliminate some tags or other bytes. The
> format could also be optimized for compression by design (to specific
> compression formats like brotli or Zstandard, though it might be possible
> to optimize for both at the same time). JSON is bloated with the quoted
> keys, long key names, and excessive punctuation – simple newline separation
> solves two of those, and a binary format could have richer separators and
> markers just by using the forgotten 1-byte control codes in UTF-8 Basic
> Latin / ASCII, in addition to 1-2 byte key names.
>
> Cheers,
>
> Joe
>
> On Sat, Aug 27, 2022, 11:11 Roger Rogerson <tesconda2@hotmail.com> wrote:
>
>> I appreciate that things like MicroData are inlined,
>> and utilise the HTML Markup to associate data with content.
>>
>> But JSON-LD Schema is embedded.
>> In many cases, this additional code serves no "human" purpose,
>> and is provided for "machines" (typically Google).
>>
>> A shining example is the following web page (remove spaces after periods):
>> https://www. delish. com/cooking/g1956/best-cookies/
>>
>> That page has approximately 35Kb of Schema.
>> That is loaded for every single human visitor.
>>
>> In the case of popular pages - this means a large amount of unnecessary
>> code is transferred (Gigabytes or higher per year).
>>
>> If the JSON-LD could be externalised into a referred to file,
>> then this could reduce bandwidth consumption for users,
>> help speed up some page load times/improve performance
>> and help towards "going green".
>>
>>
>> I appreciate that technically,
>> this isn't about "Schema" directly,
>> but about how Browsers and Parsers can recognise and handle
>> and externalised version - but I'm hoping this is the right place
>> to get it considered and the right people to see it/push it to browser
>> vendors.
>>
>>
>> Thank you.
>> Autocrat.
>>
>
Received on Wednesday, 31 August 2022 17:08:33 UTC