- From: Willy Tarreau <w@1wt.eu>
- Date: Sat, 4 Nov 2017 10:29:06 +0100
- To: Andy Green <andy@warmcat.com>
- Cc: Matthew Kerwin <matthew@kerwin.net.au>, Mark Nottingham <mnot@mnot.net>, Kazuho Oku <kazuhooku@gmail.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, HTTP Working Group <ietf-http-wg@w3.org>
On Sat, Nov 04, 2017 at 03:59:00PM +0800, Andy Green wrote: > > The ESP in my alarm clock disagrees with you here :-) > > Well, the chip doesn't care :-) it's Willy that disagrees with me. Sure :-) > I don't > claim to know the 64-bitness of BASIC or whatever you are testing with and > you don't describe it. Sorry, it's nodemcu, the Lua interpreter. > So I think it is not relevant there is some code > that runs on ESP platforms that just has 32-bit types. That can be true on > any platform. Except that it's very popular, and just one example among others. I'm not trying to find statistics on how many are OK and how many are broken, I'm just saying that there are limitations at many places and that we have to deal with these. I mentionned in another e-mail (I'm just realizing it was off-list now) that a cleanly and carefully written HTTP server used as a reference for nodemcu on this platform uses "tonumber()" to parse the content-length, an unfortunately as you can see below, tonumber() has its limits as well : > print(tonumber("3333333333")) 2147483647 > print(tonumber("0x123")) 291 > print(tonumber("000123")) 123 Just an indication that developers are not always aware of the limits they have to deal with nor the validity domain of the functions they're using. The purpose of Structured Headers is to have safer and more portable parsers, so we have to consider this above. Ideally I'd like to see a set of HTTP number parsers safe for use progressively appear as a replacement for tonumber(), atoi() and consorts athat web applications should use over the long term. > It remains a fact that both ESP8266 and ESP32 recommended stock gcc > toolchain from Espressif, that you would write http stuff with, supports > proper 64-bit long long, making deploying it and all the operators using it > trivial. Probably but not everyone uses C on such a platform when you have easy-to-use alternatives like Lua, micropython and probably others. That's part of todays web landscape unfortunately. > > > print(32768*32768) > > 1073741824 > > > print(32768*65536) > > -2147483648 > > > print(65536*65536) > > 0 > > > print(131072*65536) > > 0 > > > print(2147483648) > > 2147483647 > > > print(3333333333) > > 2147483647 > > Here are the same trials done in C using gcc on ESP32 (because that is what > I have to hand; but it's basically the same gcc toolchain + 32-bit Tensilica > core as ESP32). > > lwsl_notice("32768 * 32768 = %lld\n", (long long)32768 * (long > long)32768); > lwsl_notice("32768 * 65536 = %lld\n", (long long)32768 * (long > long)65536); > lwsl_notice("65536 * 65536 = %lld\n", (long long)65536 * (long > long)65536); > lwsl_notice("131072 * 65536 = %lld\n", (long long)32768 * (long > long)32768); > lwsl_notice("2147483648 = %lld\n", (long long)2147483648); > lwsl_notice("3333333333 = %lld\n", (long long)3333333333); > > 4: 32768 * 32768 = 1073741824 > 4: 32768 * 65536 = 2147483648 > 4: 65536 * 65536 = 4294967296 > 4: 131072 * 65536 = 1073741824 > 4: 2147483648 = 2147483648 > 4: 3333333333 = 3333333333 That's perfect and I'm not surprized. > > But in C, code using atoi() to parse integers is very common, and when the > > developers are told that atoi() is too short and unreliable and that they > > must use strtol(), they end up using it naively causing such hard to detect > > problems when they're not aware of the impacts : > > > > #include <stdio.h> > > #include <stdlib.h> > > > > int main(int argc, char **argv) > > { > > printf("atoi: 0x%16lx (%ld)\n", (long)atoi(argv[1]), (long)atoi(argv[1])); > > printf("strtol: 0x%16lx (%ld)\n", (long)strtol(argv[1], NULL, 0), (long)strtol(argv[1], NULL, 0)); > > return 0; > > } > > > > $ ./a.out 2147483647 > > atoi: 0x 7fffffff (2147483647) > > strtol: 0x 7fffffff (2147483647) > > > > $ ./a.out 2147483648 > > atoi: 0xffffffff80000000 (-2147483648) > > strtol: 0x 80000000 (2147483648) > > > > $ ./a.out 4294967296 > > atoi: 0x 0 (0) > > strtol: 0x 100000000 (4294967296) > > $ ./a.out 00003333 > > atoi: 0x d05 (3333) > > strtol: 0x 6db (1755) > > Ehhhhhhhh that's **long** you are using there. I am talking about long > long. You underestimate C programmers if you think they don't know the > difference. Sorry, I forgot to mention that this was done on my 64-bit PC where long and long-long are 64-bit. Look carefully you'll see that strtol() is safe against overflow, but that with base set to zero as commonly found, it parses hex as explained in the manual. I'm not making up this example, I've seen such things used a lot of times. In fact people start with atoi() until they're hit by a parsing issue, then switch to strtol() and don't care about specifying the or even worse, purposely support this because the same parser is used for content-length and for configuration. > > That's why I think that we must really take care of 32-bit. Signed 32-bits > > (hence unsigned 31 bits) are the only really portable integers. Most code > > works pretty well above this, but naive implementations easily get caught > > like this. > > This is why the C guys invented int64_t and friends (which are just typedefs > into long long or whatever). That is **thoroughly** portable, not just > ESP8266 + ESP32 gcc but even windows has <stdint.h> with them. I agree. The web is just not only C (unfortunately as that's by far my preferred language). It even used to be shell scripts for CGI at an era where most shells were limited to 32-bit evaluation. > I don't have an opinion on whether the thing being discussed should deal > with BIGINT, it just seemed to be missing from the discussion. If it did, > it would cover every other smaller limit case, but it would force people to > deal with their length + MSB data format. For anything other than MPINT / > BIGINT, a 64-bit limit will cover anything related to dataset size for the > foreseeable future. I really think we should *encourage* 64-bit processing, *suggest* that anything larger is possible provided it is handled with extreme care, *remind* that maximum interoperability is achieved below 2^31, and *enforce* strict parsing and detection of overflows in any case. If we design with this, we should be able to make the best design choices for most use cases and ensure that incompatibilities are safely covered. Willy
Received on Saturday, 4 November 2017 09:30:20 UTC