- From: Hugo Scott <hugo@hugoscott.com>
- Date: Thu, 15 Sep 2022 11:30:59 +0200
- To: "schema.org Mailing List" <public-schemaorg@w3.org>
- Message-ID: <CAOJ1cMR3E7POEF-URL5H0Ev=_JyhXUGHS9TxQw=-HUWqp9RvDg@mail.gmail.com>
I've been reading through this interesting thread and, while I think understand what you are saying (especially concerning the bloat of the css and JSON formats), there are a couple of issues that come to mind: If you put the schema markup and social meta data into an external file to avoid re-transferring the same data on every page load, then you are generating another HTTP request which can slightly slow the overall page load time if it's not deferred properly, and I wonder what the relative bandwidth requirements are for transferring 35k of text data compared to making a whole new HTTP request. Obviously this would not be an issue if the external file was cached by the browser, but that would then rule out having the specific per-page schema & social metas markup required for individual product pages, service pages, recipe pages, blog articles, training courses, events etc Or maybe I have misunderstood what you are suggesting? cheers Hugo Scott On Thu, 15 Sept 2022 at 10:50, Roger Rogerson <tesconda2@hotmail.com> wrote: > Hi folks. > > There is a lot of different things that can be done, > the question is - will they? > > We've had various issues for years/decade+, and HTTP has had evolutions to > compensate, > rather than correct some of the issues. > > But my focus here is on Schema, > and how certain entities have pushed it to be utilised, > in an inefficient way - causing a percentage increase of bloat, > solely for their gain. > > Simply externalising it with a standardised extension solves the issue, > instantly. > (And I agree, a lot of the opengraph stuff etc. can have the same > treatment) > > All it requires is a set of file-extensions to be recognised/accepted, > and end-systems to request them. > > If the end-systems have concerns about wasted requests (not knowing what's > available) > (ironic!), then a specialised header response can be included, with a > list of standard ext. > > x-resources: .meta, .og, .other > > If each type of content has it's own dedicated extension, > it means people need only request the one(s) they desire, > whilst normal web users don't get any of that bloat. > > The hardship is going to be content platforms. > They will be required to alter their system to handle additional requests, > and fetch specific content to emulate additional "page" requests (in this > case the URI+.meta or URI+.schema etc.) > > > But I think it's more than worth doing, > as the sheer volume of non-human traffic is ridiculous. > > > So how to make it happen? > ------------------------------ > *From:* Guy Thorsby <gthorsby@gmail.com> > *Sent:* 31 August 2022 17:03 > *To:* Joe Duarte <songofapollo@gmail.com> > *Cc:* Roger Rogerson <tesconda2@hotmail.com>; schema.org Mailing List < > public-schemaorg@w3.org> > *Subject:* Re: Permit external JSON-LD files? > > This thread is great. > > Just want to drop AMP pages in here so it gets it's representation in this > conversation. Not directly related but within the crosshairs of the context. > > https://developers.google.com/amp > > > > > On Wed, Aug 31, 2022, 10:36 Joe Duarte <songofapollo@gmail.com> wrote: > > It's a good idea, not just for Schema, but for *all* metadata not used by > browsers or similar clients. This would include all the social media > metadata like Facebook's OpenGraph, Twitter cards, Google site > verification, the search result snippet things, etc. > > I mapped out an approach for this with a separate file extension for the > non-browser metadata: .meta > > Bots would request the .meta file, in most cases in addition to the actual > page (in some cases they might only need the .meta file, maybe the social > media links where they just need title and description and an image URL). > The .meta file URLs would exactly match the page URLs, except for the > extension. > > As you noted, the status quo is quite wasteful. It's not just the useless > metadata – users are forced to download enormous amounts of CSS that is not > used by the page, typically 90%+ unused, and in the form of separate files, > which makes it even worse. And enormous amounts of JS, again in separate > files, most of it unused, and much of the rest unnecessary for the > functionality of, say, an article with comments and a few ads. There's > never been a good explanation for this – the caching argument was always a > mere assertion falsified by digging in and testing. So there's a lot of > room for improvement – the web is much slower than it could be and should > be given the ridiculous power of modern computers and the fat pipes we now > have. It's amazing how slow even multimillion dollar websites are. > > The metadata bloat isn't the biggest culprit, but it's worth sorting out > along with the other sources of bloat. I sketched out a framework with .web > and .meta files, where .web replaced HTML and CSS. It would be equivalent > to, but more compact than, a formally specified minification format for > HTML and CSS (something we could really use), combined with default > tree-shaken CSS (only the CSS used by the page is in the source, which is > trivially easy to achieve by simple selector matching), minified 1-2 byte > selector, class, ID, etc. names (no more ponderous 30-byte class names – > the browser doesn't need or do anything with them), and an efficient link > format with standardized URLs (URLs just 2-3 bytes before the hostname, > e.g. H: instead of https://, link markup just 3-4 bytes, and URLs never > more than 25 bytes after the hostname). > > The metadata format could also be much more compact. There's no reason for > machine-readable syntax to be human readable and so bloated. We could > easily flip between machine and human readable forms, so it's never made > sense to go for both in one bloated format. Most tags could be just one or > two bytes. Standardized order can eliminate some tags or other bytes. The > format could also be optimized for compression by design (to specific > compression formats like brotli or Zstandard, though it might be possible > to optimize for both at the same time). JSON is bloated with the quoted > keys, long key names, and excessive punctuation – simple newline separation > solves two of those, and a binary format could have richer separators and > markers just by using the forgotten 1-byte control codes in UTF-8 Basic > Latin / ASCII, in addition to 1-2 byte key names. > > Cheers, > > Joe > > On Sat, Aug 27, 2022, 11:11 Roger Rogerson <tesconda2@hotmail.com> wrote: > > I appreciate that things like MicroData are inlined, > and utilise the HTML Markup to associate data with content. > > But JSON-LD Schema is embedded. > In many cases, this additional code serves no "human" purpose, > and is provided for "machines" (typically Google). > > A shining example is the following web page (remove spaces after periods): > https://www. delish. com/cooking/g1956/best-cookies/ > > That page has approximately 35Kb of Schema. > That is loaded for every single human visitor. > > In the case of popular pages - this means a large amount of unnecessary > code is transferred (Gigabytes or higher per year). > > If the JSON-LD could be externalised into a referred to file, > then this could reduce bandwidth consumption for users, > help speed up some page load times/improve performance > and help towards "going green". > > > I appreciate that technically, > this isn't about "Schema" directly, > but about how Browsers and Parsers can recognise and handle > and externalised version - but I'm hoping this is the right place > to get it considered and the right people to see it/push it to browser > vendors. > > > Thank you. > Autocrat. > >
Received on Thursday, 15 September 2022 09:31:29 UTC