Re: New Version Notification for draft-nottingham-structured-headers-00.txt from Andy Green on 2017-11-04 (ietf-http-wg@w3.org from October to December 2017)

From: Andy Green <andy@warmcat.com>
Date: Sat, 4 Nov 2017 15:59:00 +0800
To: Willy Tarreau <w@1wt.eu>
Cc: Matthew Kerwin <matthew@kerwin.net.au>, Mark Nottingham <mnot@mnot.net>, Kazuho Oku <kazuhooku@gmail.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <ac85ccc0-8941-cf2a-93aa-bd558fca732f@warmcat.com>
On 11/04/2017 03:17 PM, Willy Tarreau wrote:
> On Fri, Nov 03, 2017 at 03:53:57PM +0800, Andy Green wrote:
>> There seems to be a basic choice to have some strict integer limit, or to
>> avoid it by having a "stretchy" way to have arbitrary-sized ints.  If people
>> are OK with a strict limit, 64-bit is very widely supported at C level even
>> down on "ESP8266" Willy mentioned.  I think a 32-bit limit is a non-starter.
> 
> The ESP in my alarm clock disagrees with you here :-)

Well, the chip doesn't care :-) it's Willy that disagrees with me.  I 
don't claim to know the 64-bitness of BASIC or whatever you are testing 
with and you don't describe it.  So I think it is not relevant there is 
some code that runs on ESP platforms that just has 32-bit types.  That 
can be true on any platform.

It remains a fact that both ESP8266 and ESP32 recommended stock gcc 
toolchain from Espressif, that you would write http stuff with, supports 
proper 64-bit long long, making deploying it and all the operators using 
it trivial.

>    > print(32768*32768)
>    1073741824
>    > print(32768*65536)
>    -2147483648
>    > print(65536*65536)
>    0
>    > print(131072*65536)
>    0
>    > print(2147483648)
>    2147483647
>    > print(3333333333)
>    2147483647

Here are the same trials done in C using gcc on ESP32 (because that is 
what I have to hand; but it's basically the same gcc toolchain + 32-bit 
Tensilica core as ESP32).

         lwsl_notice("32768 * 32768 = %lld\n", (long long)32768 * (long 
long)32768);
         lwsl_notice("32768 * 65536 = %lld\n", (long long)32768 * (long 
long)65536);
         lwsl_notice("65536 * 65536 = %lld\n", (long long)65536 * (long 
long)65536);
         lwsl_notice("131072 * 65536 = %lld\n", (long long)32768 * (long 
long)32768);
         lwsl_notice("2147483648 = %lld\n", (long long)2147483648);
         lwsl_notice("3333333333 = %lld\n", (long long)3333333333);

4: 32768 * 32768 = 1073741824
4: 32768 * 65536 = 2147483648
4: 65536 * 65536 = 4294967296
4: 131072 * 65536 = 1073741824
4: 2147483648 = 2147483648
4: 3333333333 = 3333333333

> That's totally scary. The first 3 can be expected due to integer overflow
> (2^30 is OK, 2^31 is negative, 2^32=0). The 4th one (2^33) is unexpected.
> The last two ones shows that integer parsing uses saturation and reports
> wrong values. That's precisely the type of issue I'd rather be sure to
> address for sensitive protocol elements. But as long as we propose a
> portable way to detect that a number is correctly parsed, I'm fine,
> because as I mentionned earlier in this thread, such devices are not
> going to be used to retrieve a 1 TB file.
> 
> But in C, code using atoi() to parse integers is very common, and when the
> developers are told that atoi() is too short and unreliable and that they
> must use strtol(), they end up using it naively causing such hard to detect
> problems when they're not aware of the impacts :
> 
>    #include <stdio.h>
>    #include <stdlib.h>
> 
>    int main(int argc, char **argv)
>    {
>          printf("atoi:   0x%16lx (%ld)\n", (long)atoi(argv[1]), (long)atoi(argv[1]));
>          printf("strtol: 0x%16lx (%ld)\n", (long)strtol(argv[1], NULL, 0), (long)strtol(argv[1], NULL, 0));
>          return 0;
>    }
> 
>    $ ./a.out 2147483647
>    atoi:   0x        7fffffff (2147483647)
>    strtol: 0x        7fffffff (2147483647)
> 
>    $  ./a.out 2147483648
>    atoi:   0xffffffff80000000 (-2147483648)
>    strtol: 0x        80000000 (2147483648)
> 
>    $ ./a.out 4294967296
>    atoi:   0x               0 (0)
>    strtol: 0x       100000000 (4294967296)
>   
>    $ ./a.out 00003333
>    atoi:   0x             d05 (3333)
>    strtol: 0x             6db (1755)

Ehhhhhhhh that's **long** you are using there.  I am talking about long 
long.  You underestimate C programmers if you think they don't know the 
difference.

> That's why I think that we must really take care of 32-bit. Signed 32-bits
> (hence unsigned 31 bits) are the only really portable integers. Most code
> works pretty well above this, but naive implementations easily get caught
> like this.

This is why the C guys invented int64_t and friends (which are just 
typedefs into long long or whatever).  That is **thoroughly** portable, 
not just ESP8266 + ESP32 gcc but even windows has <stdint.h> with them.

So I dunno (because you don't tell) what you tested on, but the generic 
C toolchain for esp8266 + esp32 has 64-bit long long and uint64_t / 
int64_t, and support for all the C operators on those types, which is 
what matters for these platforms.

> In fact I would probably be fine with unbounded integers if we :
>    - explain how to safely parse an integer (and detect its correctness and
>      validity for the type being used)
>    - mention a warning about the risk of widespread code not capable of
>      dealing with quantities larger than 2147483647.
> 
> (...)
>> No, floats are not ints and cannot reliably be converted into them. They are
>> a lossy approximation format.  For example you cannot put key BIGINTs into
>> floats.
> 
> I agree, I really don't like floats to represent integers for discrete
> values. Byte offsets and content-lengths are discrete values and must
> absolutely be exact. Parsing "Content-length: 12345.5555" will lead to
> funny things depending on the implementations. Some truncating to 12345
> and others rounding up to 12346. It's even worse when such numbers have
> to be converted to other quantities (eg: bits), because 12345.5555*8=
> 98764.4440 which differs from 12345*8=98760 possibly resulting in negative
> values in subtracts (eg: remaining space in a buffer).

Yeah we completely agree on that, floats won't do for generic ints.

I don't have an opinion on whether the thing being discussed should deal 
with BIGINT, it just seemed to be missing from the discussion.  If it 
did, it would cover every other smaller limit case, but it would force 
people to deal with their length + MSB data format.  For anything other 
than MPINT / BIGINT, a 64-bit limit will cover anything related to 
dataset size for the foreseeable future.

-Andy

> Willy
>
Received on Saturday, 4 November 2017 08:00:32 UTC