- From: Andy Green <andy@warmcat.com>
- Date: Sat, 4 Nov 2017 15:59:00 +0800
- To: Willy Tarreau <w@1wt.eu>
- Cc: Matthew Kerwin <matthew@kerwin.net.au>, Mark Nottingham <mnot@mnot.net>, Kazuho Oku <kazuhooku@gmail.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, HTTP Working Group <ietf-http-wg@w3.org>
On 11/04/2017 03:17 PM, Willy Tarreau wrote: > On Fri, Nov 03, 2017 at 03:53:57PM +0800, Andy Green wrote: >> There seems to be a basic choice to have some strict integer limit, or to >> avoid it by having a "stretchy" way to have arbitrary-sized ints. If people >> are OK with a strict limit, 64-bit is very widely supported at C level even >> down on "ESP8266" Willy mentioned. I think a 32-bit limit is a non-starter. > > The ESP in my alarm clock disagrees with you here :-) Well, the chip doesn't care :-) it's Willy that disagrees with me. I don't claim to know the 64-bitness of BASIC or whatever you are testing with and you don't describe it. So I think it is not relevant there is some code that runs on ESP platforms that just has 32-bit types. That can be true on any platform. It remains a fact that both ESP8266 and ESP32 recommended stock gcc toolchain from Espressif, that you would write http stuff with, supports proper 64-bit long long, making deploying it and all the operators using it trivial. > > print(32768*32768) > 1073741824 > > print(32768*65536) > -2147483648 > > print(65536*65536) > 0 > > print(131072*65536) > 0 > > print(2147483648) > 2147483647 > > print(3333333333) > 2147483647 Here are the same trials done in C using gcc on ESP32 (because that is what I have to hand; but it's basically the same gcc toolchain + 32-bit Tensilica core as ESP32). lwsl_notice("32768 * 32768 = %lld\n", (long long)32768 * (long long)32768); lwsl_notice("32768 * 65536 = %lld\n", (long long)32768 * (long long)65536); lwsl_notice("65536 * 65536 = %lld\n", (long long)65536 * (long long)65536); lwsl_notice("131072 * 65536 = %lld\n", (long long)32768 * (long long)32768); lwsl_notice("2147483648 = %lld\n", (long long)2147483648); lwsl_notice("3333333333 = %lld\n", (long long)3333333333); 4: 32768 * 32768 = 1073741824 4: 32768 * 65536 = 2147483648 4: 65536 * 65536 = 4294967296 4: 131072 * 65536 = 1073741824 4: 2147483648 = 2147483648 4: 3333333333 = 3333333333 > That's totally scary. The first 3 can be expected due to integer overflow > (2^30 is OK, 2^31 is negative, 2^32=0). The 4th one (2^33) is unexpected. > The last two ones shows that integer parsing uses saturation and reports > wrong values. That's precisely the type of issue I'd rather be sure to > address for sensitive protocol elements. But as long as we propose a > portable way to detect that a number is correctly parsed, I'm fine, > because as I mentionned earlier in this thread, such devices are not > going to be used to retrieve a 1 TB file. > > But in C, code using atoi() to parse integers is very common, and when the > developers are told that atoi() is too short and unreliable and that they > must use strtol(), they end up using it naively causing such hard to detect > problems when they're not aware of the impacts : > > #include <stdio.h> > #include <stdlib.h> > > int main(int argc, char **argv) > { > printf("atoi: 0x%16lx (%ld)\n", (long)atoi(argv[1]), (long)atoi(argv[1])); > printf("strtol: 0x%16lx (%ld)\n", (long)strtol(argv[1], NULL, 0), (long)strtol(argv[1], NULL, 0)); > return 0; > } > > $ ./a.out 2147483647 > atoi: 0x 7fffffff (2147483647) > strtol: 0x 7fffffff (2147483647) > > $ ./a.out 2147483648 > atoi: 0xffffffff80000000 (-2147483648) > strtol: 0x 80000000 (2147483648) > > $ ./a.out 4294967296 > atoi: 0x 0 (0) > strtol: 0x 100000000 (4294967296) > > $ ./a.out 00003333 > atoi: 0x d05 (3333) > strtol: 0x 6db (1755) Ehhhhhhhh that's **long** you are using there. I am talking about long long. You underestimate C programmers if you think they don't know the difference. > That's why I think that we must really take care of 32-bit. Signed 32-bits > (hence unsigned 31 bits) are the only really portable integers. Most code > works pretty well above this, but naive implementations easily get caught > like this. This is why the C guys invented int64_t and friends (which are just typedefs into long long or whatever). That is **thoroughly** portable, not just ESP8266 + ESP32 gcc but even windows has <stdint.h> with them. So I dunno (because you don't tell) what you tested on, but the generic C toolchain for esp8266 + esp32 has 64-bit long long and uint64_t / int64_t, and support for all the C operators on those types, which is what matters for these platforms. > In fact I would probably be fine with unbounded integers if we : > - explain how to safely parse an integer (and detect its correctness and > validity for the type being used) > - mention a warning about the risk of widespread code not capable of > dealing with quantities larger than 2147483647. > > (...) >> No, floats are not ints and cannot reliably be converted into them. They are >> a lossy approximation format. For example you cannot put key BIGINTs into >> floats. > > I agree, I really don't like floats to represent integers for discrete > values. Byte offsets and content-lengths are discrete values and must > absolutely be exact. Parsing "Content-length: 12345.5555" will lead to > funny things depending on the implementations. Some truncating to 12345 > and others rounding up to 12346. It's even worse when such numbers have > to be converted to other quantities (eg: bits), because 12345.5555*8= > 98764.4440 which differs from 12345*8=98760 possibly resulting in negative > values in subtracts (eg: remaining space in a buffer). Yeah we completely agree on that, floats won't do for generic ints. I don't have an opinion on whether the thing being discussed should deal with BIGINT, it just seemed to be missing from the discussion. If it did, it would cover every other smaller limit case, but it would force people to deal with their length + MSB data format. For anything other than MPINT / BIGINT, a 64-bit limit will cover anything related to dataset size for the foreseeable future. -Andy > Willy >
Received on Saturday, 4 November 2017 08:00:32 UTC