- From: Garret Rieger <grieger@google.com>
- Date: Thu, 25 Jul 2019 13:14:31 -0700
- To: Behdad Esfahbod <behdad@fb.com>
- Cc: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
- Message-ID: <CAM=OCWacc1sjP=Jr+RR8kBFyJqggoxTBaK8+eHi3zCB4eXtXLA@mail.gmail.com>
1. Agreed, for actual implementation we'd just let the http transport layer apply compression around the entire payload. 2. I was thinking the same thing, I'll give that a try. 3. My idea for ranges is to use a union of ranges and the sparse bit set. Basically first encode a set of ranges (say any runs of codepoints longer than some threshold) and then encode everything left over using a sparse bit set. On Thu, Jul 25, 2019 at 11:13 AM Behdad Esfahbod <behdad@fb.com> wrote: > Hi Garret, > > Thanks for the document! Here's my thoughts: > > 1. I suggest avoiding generic compression at this level. Would be nice if > the entire request/response are compressed automatically, but I suggest we > design without it. Either browsers already have Brotli compression code or > don't. I don't think we should require it for the codepoint set, since as > you discovered, is not a huge win anyway given the nature of data and the > fact that we can design it to be efficient. > > 2. Since random-access is not required, one can use a multibyte encoding, > which should make the delta-list pack much better. I suggest just using > the UTF-8 encoding. > > 3. ICU keeps such lists as an alternating "in-out" list. Ie, if the list > is "5,8,9,14", it will encode it as "5,6,8,10,14,15". One can think of > this as a list of ranges: (5,6),(8,10),(14,15). You can try doing it this > way and then take the deltas. This will address the range use-case. You > can also try to come up with a hybrid encoding that can encode ranges > efficiently without increasing cost for sparse sets significantly. > > I think doing the above should get you a very simple-to-encode > simple-to-decode and fairly-efficient encoding. > > Cheers, > b > ------------------------------ > *From:* Garret Rieger <grieger@google.com> > *Sent:* Wednesday, July 24, 2019 2:14 PM > *To:* w3c-webfonts-wg (public-webfonts-wg@w3.org) < > public-webfonts-wg@w3.org> > *Subject:* Exploring how to encode code point sets > > Recently I've been thinking about the specific design of the protocol for > the subset and patch method since we'll need that for the analysis. One of > the most important pieces is how to efficiently encode the code point sets > that are transferred from the client to server on each request. If an > inefficient encoding is used it could add a material amount of overhead to > the requests. > > So I came up with a list of potential methods for encoding the sets and > tested them out on simulated code point sets. An overview of the analysis > and the results can be found here > <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_19K5MCElyjdUZknoxHepcC3s7tc-2Di4I8yK2M1Eo2IXFw_edit-3Fusp-3Dsharing&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=P9JUMpOWw22-3xIiv7QgGg&m=tJs0aisgmekqSE2yg_K1iPyNoOfI5-XadZh3YwA0d9w&s=UVBYUiwFxUG3uxu5OolSWKidRkGDY8_sBZWhO_t1uII&e=> > . > > Does anyone have other ideas on techniques/thoughts for efficiently > encoding sets of codepoints? >
Received on Thursday, 25 July 2019 20:15:14 UTC