Re: Security concern about open range integers (was: Question about: 4.1.1 Integer representation) from Fred Akalin on 2013-10-21 (ietf-http-wg@w3.org from October to December 2013)

From: Fred Akalin <akalin@google.com>
Date: Sun, 20 Oct 2013 23:24:56 -0700
To: Roberto Peon <grmocg@gmail.com>
Cc: Frédéric Kayser <f.kayser@free.fr>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CANUYc_S_k1fTShJRckKha6HMZ-B2ySFFJ0pJQE8D1_FnEwpiWA@mail.gmail.com>

I think it's worth mentioning explicit upper bounds in the spec. Something
like any decoded varint must fit in 32 bits. Since settings are 32-bit
values, that implies that max context size is also 32 bits, which gives a
hard upper bound on the number of table entries (27 bits). We can probably
give better upper bounds for max header name length and value length, too,
right?

On Sun, Oct 20, 2013 at 5:23 PM, Roberto Peon <grmocg@gmail.com> wrote:

> If any value is too large, the connection should be torn down.
> The definition of 'too large' depends utterly on details that we cannot
> predict.
> For many SoC or embedded implementations, this may be 4k.
> For better provisioned clients, this could be gigabytes or terabytes.
>
> This problem already exists in SPDY, and is dealt with in the venerable
> tradition of deciding if the request is sane, and if not, dealing with it :)
>
> As for speed-- for the data frames we don't use this kind of encoding--
> the overhead of a fixed encoding is small there.
> For a compressor, this isn't as true-- the overhead of a fixed encoding is
> much higher.
> In any case, the vast majority of the time, there will be zero speed
> difference since the first byte is all that is necessary to encode the
> index/length.
> -=R
>
>
> On Sun, Oct 20, 2013 at 5:18 PM, Frédéric Kayser <f.kayser@free.fr> wrote:
>
>> Hello,
>> let me express it a bit differently "the well known and well supported
>> format" is an universal integer encoding (closely related to a 7, 7, ∞
>> start-step-stop code) this means it can encode any integer of unknown size
>> including ridiculously large ones.
>> Know how is a server supposed to react if it receives a malicious header
>> that states that the next field is FF FF FF FF FF FF FF FF 7F bytes long
>> (thats 2^56 + 255 if I'm not wrong)? Naive implementations will probably
>> have a shock trying to decode this in a fixed size 32-bits int (add a
>> couple of FF and it overruns a 64-bits int) or allocating memory for the
>> field in question, others may discard it, but at which point take the cut
>> between legitimate large values and malicious ones? My concern is that
>> every 8+ or whatever+ is a potential memory bomb if its not capped at some
>> point.
>>
>> My initial concern was rather a speed concern (read micro-optimization to
>> speed-up decoding) concerning "the well known and well supported format"
>> since a capped version of such a code could be processed faster, encoding
>> is always relatively fast decoding is a bit more tricky since the bits
>> recorded first are groups of 7 least significant bits whereas a
>> real start-step-stop code would keep the natural bit order and would not
>> require bit shifts (just byte swaps eventually).
>>
>> Regards
>> --
>> Frédéric Kayser
>>
>> Roberto Peon wrote:
>>
>> The rest of the int is in a well known and well supported format.
>>
>> -=R
>> On Oct 18, 2013 1:03 AM, "Frédéric Kayser" <f.kayser@free.fr> wrote:
>>
>>> And wouldn't it be way easier and faster to record/reconstruct such
>>> integers by moving all the flags (most significant bits) of each byte in
>>> front of the code?
>>>
>>> I < 128
>>> 0xxxxxxx
>>> I < 16.384
>>> 10xxxxxx xxxxxxxx  (compared to 1yyyyyyy 0yyyyyyy groups of 7 least
>>> significant bits first)
>>> I < 2.097.152
>>> 110xxxxx xxxxxxxx xxxxxxxx  (compared
>>> to 1yyyyyyy 1yyyyyyy 0yyyyyyy groups of 7 least significant bits first)
>>> I < 268.435.456
>>> 1110xxxx xxxxxxxx xxxxxxxx xxxxxxxx  (compared
>>> to 1yyyyyyy 1yyyyyyy 1yyyyyyy 0yyyyyyy groups of 7 least significant bits
>>> first)
>>> …
>>>
>>> Where xxx…xxx is simply the binary representation of I
>>>
>>> The first part looks like unary coding and the overall form of this
>>> encoding is a 7, 7, ∞ start-step-code or a base 128 Elias γ′ code, do you
>>> really want to keep the universal nature of such a code or cap it?
>>>
>>> Regards
>>> --
>>> Frédéric Kayser
>>>
>>
>>
>

Received on Monday, 21 October 2013 06:25:50 UTC