- From: Willy Tarreau <w@1wt.eu>
- Date: Sat, 4 Nov 2017 08:17:00 +0100
- To: Andy Green <andy@warmcat.com>
- Cc: Matthew Kerwin <matthew@kerwin.net.au>, Mark Nottingham <mnot@mnot.net>, Kazuho Oku <kazuhooku@gmail.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, HTTP Working Group <ietf-http-wg@w3.org>
On Fri, Nov 03, 2017 at 03:53:57PM +0800, Andy Green wrote: > There seems to be a basic choice to have some strict integer limit, or to > avoid it by having a "stretchy" way to have arbitrary-sized ints. If people > are OK with a strict limit, 64-bit is very widely supported at C level even > down on "ESP8266" Willy mentioned. I think a 32-bit limit is a non-starter. The ESP in my alarm clock disagrees with you here :-) > print(32768*32768) 1073741824 > print(32768*65536) -2147483648 > print(65536*65536) 0 > print(131072*65536) 0 > print(2147483648) 2147483647 > print(3333333333) 2147483647 That's totally scary. The first 3 can be expected due to integer overflow (2^30 is OK, 2^31 is negative, 2^32=0). The 4th one (2^33) is unexpected. The last two ones shows that integer parsing uses saturation and reports wrong values. That's precisely the type of issue I'd rather be sure to address for sensitive protocol elements. But as long as we propose a portable way to detect that a number is correctly parsed, I'm fine, because as I mentionned earlier in this thread, such devices are not going to be used to retrieve a 1 TB file. But in C, code using atoi() to parse integers is very common, and when the developers are told that atoi() is too short and unreliable and that they must use strtol(), they end up using it naively causing such hard to detect problems when they're not aware of the impacts : #include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { printf("atoi: 0x%16lx (%ld)\n", (long)atoi(argv[1]), (long)atoi(argv[1])); printf("strtol: 0x%16lx (%ld)\n", (long)strtol(argv[1], NULL, 0), (long)strtol(argv[1], NULL, 0)); return 0; } $ ./a.out 2147483647 atoi: 0x 7fffffff (2147483647) strtol: 0x 7fffffff (2147483647) $ ./a.out 2147483648 atoi: 0xffffffff80000000 (-2147483648) strtol: 0x 80000000 (2147483648) $ ./a.out 4294967296 atoi: 0x 0 (0) strtol: 0x 100000000 (4294967296) $ ./a.out 00003333 atoi: 0x d05 (3333) strtol: 0x 6db (1755) That's why I think that we must really take care of 32-bit. Signed 32-bits (hence unsigned 31 bits) are the only really portable integers. Most code works pretty well above this, but naive implementations easily get caught like this. In fact I would probably be fine with unbounded integers if we : - explain how to safely parse an integer (and detect its correctness and validity for the type being used) - mention a warning about the risk of widespread code not capable of dealing with quantities larger than 2147483647. (...) > No, floats are not ints and cannot reliably be converted into them. They are > a lossy approximation format. For example you cannot put key BIGINTs into > floats. I agree, I really don't like floats to represent integers for discrete values. Byte offsets and content-lengths are discrete values and must absolutely be exact. Parsing "Content-length: 12345.5555" will lead to funny things depending on the implementations. Some truncating to 12345 and others rounding up to 12346. It's even worse when such numbers have to be converted to other quantities (eg: bits), because 12345.5555*8= 98764.4440 which differs from 12345*8=98760 possibly resulting in negative values in subtracts (eg: remaining space in a buffer). Willy
Received on Saturday, 4 November 2017 07:18:23 UTC