Re: Compression dictionary draft ID - draft-meenan-httpbis-compression-dictionary-00

I have a few concerns with making the URL available (for URL-delivered
dictionaries).  It's possible we could add another optional header as a
request hint but there's a risk of middle-boxes holding it wrong.

- You would still need to verify that the hash of the dictionary matches
the requested hash before using it. Re-calculating the hash on every access
could be pretty expensive (feels like verifying the hash before writing
would be hit less often).
- The same URL could represent multiple dictionaries with different hashes
(think a well-known URL like fb.js that does in-place upgrades).

Indexing the entry by URL and hash could potentially solve the issues but
could it not just as easily be indexed by the hash without the URL since
the hash is the unique index?

On Tue, Aug 8, 2023 at 4:27 PM Vlad Krasnov <vlad@cloudflare.com> wrote:

> There are many possible architectures that can work of course, but the
> simplest one and most generic one is to hope the dictionary is present in
> your local cache and fetch it (with Cache-Control: only-if-cached;
> If-Match: etag).
>
> I would definitely avoid using a KV for a generic solution, maybe in the
> far future for high value use case.
>
> I can't simply store an arbitrary number of dictionaries in KV, this can
> very well explode, and some intelligent eviction is needed.
>
> We also don't want to store multiple versions of the same resource
> compressed with different algorithms, even for statically compressed
> assets. It consumes valuable space that could be used to cache other assets
> instead.
>
> In any case, having the URL and ETag (and obviously ETags are not always
> present either) allows us to have *something* in production almost out of
> the box, having just the cache requires building new stuff, which is a much
> higher bar for entrance.
>
> Best,
> Vlad
>
> On Aug 8, 2023, at 11:46 AM, Patrick Meenan <patmeenan@gmail.com> wrote:
>
> That would necessarily couple the use of a dictionary with the negotiation
> of the storage. They way it is currently set up, "use-as-dictionary" is one
> way to populate the dictionary but the dictionaries could also be
> pre-loaded into a client or otherwise negotiated.
>
> As far as state, if you're passively observing traffic to compress and not
> actively managing the dictionaries, doesn't something like this work?
>
> 1. See response with "use-as-dictionary" response header
> 2. Store dictionary in key-value store, using the hash of the payload as
> the key
> 3. See request with "sec-available-dictionary" request header
>
> For supporting static, offline resource compression:
> - If a version of the resource is in cache compressed with the requested
> dictionary, serve it
> - If not, kick off an async task to compress the cached full response with
> the dictionary (if one is available in KV)
>
> For dynamic responses:
> - If the dictionary is in KV, use it to compress the response
>
> The main "state" that needs to be managed is the storage of the dictionary
> keys, indexed by the appropriate hash.
>
> If the CDN is generating the dictionaries (or adding the headers) there's
> not much more state than that either.
>
> I'm sure I'm probably hand-waving a lot of things that need to be done at
> scale when handling thousands of dictionaries and thousands of simultaneous
> requests for the same dictionary but I'd hate to add complexity to the
> negotiation itself if it isn't absolutely necessary.
>
> On Tue, Aug 8, 2023 at 9:49 AM Vlad Krasnov <vlad@cloudflare.com> wrote:
>
>> I think it would make the proposal easier to support from a CDN POV if
>> the Sec-Available-Dictionary included the dictionary URL and Etag,
>> otherwise it requires too much state keeping.
>>
>> Given a URL and an Etag it is pretty easy to have a best effort
>> dictionary compression.
>>
>> Best,
>> Vlad
>>
>
>

Received on Tuesday, 8 August 2023 21:37:48 UTC