Re: Compression dictionary draft ID - draft-meenan-httpbis-compression-dictionary-00

I updated the draft with fixes for sbr -> br-d, szstd -> zstd-d and to
specify that the Sec-Available-Dictionary header is Base16-encoded.

There are more compact and structured representations but the ergonomics of
using a string that can be stored on a filesystem are pretty significant
for this use case.  For example, in the case of delivering static resources
(i.e. main.js), at build time the dooling for the site can compress the
resource with the dictionaries it chooses to support and just append the
hex has string to the end of the file name (main.js.ABCDEF12334...).  At
serving time, the server doesn't need to do any other processing than
append the Sec-Available-Dictionary string to the file name and checking if
the file is available. They'd want to sanity-check that it's a valid hex
string to prevent abuse but there's minimal processing involved. If there's
enough benefit from using a different encoding I'm ok with switching but
Base16 eases the deployment somewhat.

As far as applying it to requests, I think the negotiation is going to be
radically different and would probably justify it's own RFC. The
negotiation needs to make sure that the receiver has a given dictionary
available before doing the encoding. That could be done with something like
a 100 continue flow but it will look very different and negotiating the
payload of the dictionary would use a different mechanism.

For URI vs Segment, even though the URLs must be same-origin, there's some
benefit to allowing arbitrary URL formatting to allow for the clean use of
relative URLs. They are all going to be expanded to absolute URLs anyway
and it wouldn't break anything if someone chose to include the protocol and
host parts (though there's no benefit to sending them).

On Fri, Jun 30, 2023 at 4:26 PM Lucas Pardue <lucaspardue.24.7@gmail.com>
wrote:

> Hi Pat,
>
> It's Friday and I am feeling a bit full of beans, so a few first things
> that popped into my head is
>
> 1): why restrict this to responses? I could imagine some upload use cases
> that might benefit. The added complexity might not be worth the effort now
> but wondered if you (or others had considered it already)
> 2) I agree with Ilari that using Byte Sequence for the
> Sec-Available-Dictionary seems cleaner. I might even go as far as to
> suggest you copy one of new Digest fields (e.g. [1]) ] and use a Dictionary
> to convey both the algorithm name and the computed value, but limit the
> size of the dictionary to 1.
> 3) On the topic of Digest fields, there seems a possibility for the server
> to add more safety checks by sending the digest of resource before your new
> encoding. I don't think it's strictly needed but it is technically
> possible; see this I-D [2]
>
> Cheers,
> Lucas
>
> [1] -
> https://httpwg.org/http-extensions/draft-ietf-httpbis-digest-headers.html#section-2
> [2] -
> https://www.ietf.org/archive/id/draft-pardue-http-identity-digest-01.html
>
> On Fri, Jun 30, 2023 at 8:13 PM Patrick Meenan <patmeenan@gmail.com>
> wrote:
>
>> Thanks. I'll update the draft shortly to fix the edits. We originally had
>> used the "s" prefix and changed to "-d" but looks like I missed a spot.
>>
>> I'll also add the detail that the hash is hex encoded ASCII with the
>> proper casing (makes it much easier to deploy if encoded assets are stored
>> with the hash appended to the file name).
>>
>> The match is supposed to be for the full url including query Paramus so ?
>> isn't much better and has the filesystem baggage of usually being a single
>> character.
>>
>> As far as the match algorithm goes, it's a linear pass through the url
>> and match string plus some special handling at the beginning and end. I
>> could see a lot of wildcards causing it to take a bit longer but no
>> explosion of an exponential kind. There could be a fast exit added if the
>> length of the match string exceeds the length of the URL.
>>
>> On Fri, Jun 30, 2023 at 2:53 PM Ilari Liusvaara <ilariliusvaara@welho.com>
>> wrote:
>>
>>> On Fri, Jun 30, 2023 at 01:30:07PM -0400, Patrick Meenan wrote:
>>> > Yoav and I have put together a first draft of a proposal for Compressed
>>> > Dictionary Transport. It's currently an individual draft but we'd like
>>> to
>>> > see if the HTTP working group would be willing to adopt it so we can
>>> all
>>> > iterate on the spec and get to something that is hopefully
>>> > consensus-shippable.
>>> >
>>> > This is otherwise known as the latest attempt at "shared brotli"
>>> > compression but in a more generic form that supports brotli and zstd
>>> and
>>> > hopefully resolves the security and privacy concerns of previous
>>> attempts.
>>> >
>>> > The draft is here:
>>> >
>>> https://datatracker.ietf.org/doc/draft-meenan-httpbis-compression-dictionary/
>>> >
>>> > The explainer (with examples and some browse-specific HTML bits) is
>>> here:
>>> > https://github.com/WICG/compression-dictionary-transport
>>> >
>>> > Some of the field names have changed since the explainer and I expect
>>> > bikeshedding will refine them further.
>>> >
>>> > Chrome will be running a field trial of the compression in the next few
>>> > months to gather developer feedback and see how it works for deploying.
>>> > The spec is hopefully written in such a way that it is not specific to
>>> the
>>> > browser use case but does have some additional carve-outs for some of
>>> the
>>> > browser-specific privacy concerns.
>>>
>>> Some quick comments:
>>>
>>> - Allowing absolute URLs in match is a footgun, since dictionaries are
>>>   restricted to same-origin. I don't think any of the usual URI
>>>   productions are suitable here.
>>>
>>>   I think most suitable would be 'segment *( "/" segment )', where
>>>   segment is the production from RFC 3986.
>>>
>>> - If match patterns are intended to be paths, one could use ? as the
>>>   wildcard, avoiding double meanings, since HTTP paths can not contain
>>>   ?.
>>>
>>> - It is not clear how Sec-Available-Dictionary is encoded. Is it
>>>   hex encoding? base64 encoding? base64url encoding? Something
>>>   else?
>>>
>>>   One could use sf-binary with the binary hash value.
>>>
>>> - Can that match algorithm blow up in runtime?
>>>
>>> - One place calls the encodings "br-d" and "zstd-d" and IANA
>>>   considerations seem to have "sbr" and "szstd".
>>>
>>>
>>>
>>>
>>> -Ilari
>>>
>>>

Received on Friday, 30 June 2023 22:06:54 UTC