Re: [csswg-drafts] [css-???] Standardise a binary representation for CSS (#3334) from Robin Berjon via GitHub on 2019-01-15 (public-css-archive@w3.org from January 2019)

From: Robin Berjon via GitHub <sysbot+gh@w3.org>
Date: Tue, 15 Jan 2019 04:27:53 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-454263352-1547526472-sysbot+gh@w3.org>

Whoa, talk about a blast from the past.

From a _very_ quick look at EXIficient for CSS it does look like it is not as efficient as it could be, but not for the reason listed here. It looks like they made the decision to prepopulate a compression buffer with a number of property names rather than to make them an actual enumeration. This is _probably_ less efficient, though it's more readily extensible. I believe EXI is robust enough to be more compact here while retaining extensibility but I've been out of the loop for almost twenty years. That EXI is Infoset based should not be the issue here; a PSVI view of CSS is probably not that far removed from a CSS AST. (I'm doing a lot of guessing using neurons I thought were dead — watch me move my arms around!)

Taking a step back: before you jump into any discussion of binarification, you want to know what it is that you're optimising for. Optmising network speed and processing speed for instance do not involve the same techniques, and some are in tension with one another. It's old, but I would recommend taking a good look (with goggles that forgive some of the worst injuries of time, a bit like those you use to watch an ABBA video) at the XBC framework:

First there are properties that define a format: https://www.w3.org/TR/xbc-properties/. I think a lot of these are more or less reusable as-is (and many you wouldn't care about for CSS) though it's worth taking a bit of time to think about properties that might make sense for CSS that aren't there.

Then there those properties are mapped onto a set of use cases: https://www.w3.org/TR/xbc-use-cases/. I think this would definitely be a _lot_ simpler for CSS. This is just use cases as per usual, but having the language of properties available really makes thinking of the performance trade-offs easier (because perf questions trigger all kinds of instinctual knowledge in geeks that are almost always wrong).

We had a theory of how to measure all properties (https://www.w3.org/TR/xbc-measurement/). It's fun, but you can probably skip most if not all.

Finally you bring it all together: https://www.w3.org/TR/xbc-characterization/.

Now for the potential shortcuts:

* Have you considered a super simple TLV format? I would guess (but it really is a guess) that CSS's human-friendly syntax might contribute to making it slower. TLV formats tend to be stupidly fast to parse. You _could_ look at prefixing it with a dictionary so as to internalise the strings but that might not gain you much. (In size, running gzip on top of TLVed CSS would almost certainly take care of that better than you will.) Note that TLV can still allow implementation to compete on compression: if they can be smart about how they group and reorder properties for instance, there could be differential gains from a gzip pass on top.
* WASM. I'm seriously out of the loop on this but nothing's faster than highly-optimised code you already have. Are there bits there that could be reused here?

Please take all of the above with a grain of salt. What I do these days is analyse how legal decisions impact marketing technology and pontificate about virtue ethics — I'm a bit rusty on the compression front.

But hey, if you're going to use EXI you might as well switch the whole thing to XSL:FO and — I'M JOKING!

--
GitHub Notification of comment by darobin
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/3334#issuecomment-454263352 using your GitHub account

Received on Tuesday, 15 January 2019 04:27:54 UTC