Re: New Version Notification for draft-nottingham-structured-headers-00.txt from Willy Tarreau on 2017-11-04 (ietf-http-wg@w3.org from October to December 2017)

From: Willy Tarreau <w@1wt.eu>
Date: Sat, 4 Nov 2017 08:17:00 +0100
To: Andy Green <andy@warmcat.com>
Cc: Matthew Kerwin <matthew@kerwin.net.au>, Mark Nottingham <mnot@mnot.net>, Kazuho Oku <kazuhooku@gmail.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20171104071700.GC18956@1wt.eu>

On Fri, Nov 03, 2017 at 03:53:57PM +0800, Andy Green wrote:
> There seems to be a basic choice to have some strict integer limit, or to
> avoid it by having a "stretchy" way to have arbitrary-sized ints.  If people
> are OK with a strict limit, 64-bit is very widely supported at C level even
> down on "ESP8266" Willy mentioned.  I think a 32-bit limit is a non-starter.

The ESP in my alarm clock disagrees with you here :-)

  > print(32768*32768)
  1073741824
  > print(32768*65536)
  -2147483648
  > print(65536*65536)
  0
  > print(131072*65536)
  0
  > print(2147483648)
  2147483647
  > print(3333333333)
  2147483647

That's totally scary. The first 3 can be expected due to integer overflow
(2^30 is OK, 2^31 is negative, 2^32=0). The 4th one (2^33) is unexpected.
The last two ones shows that integer parsing uses saturation and reports
wrong values. That's precisely the type of issue I'd rather be sure to
address for sensitive protocol elements. But as long as we propose a
portable way to detect that a number is correctly parsed, I'm fine,
because as I mentionned earlier in this thread, such devices are not
going to be used to retrieve a 1 TB file.

But in C, code using atoi() to parse integers is very common, and when the
developers are told that atoi() is too short and unreliable and that they
must use strtol(), they end up using it naively causing such hard to detect
problems when they're not aware of the impacts :

  #include <stdio.h>
  #include <stdlib.h>

  int main(int argc, char **argv)
  {
        printf("atoi:   0x%16lx (%ld)\n", (long)atoi(argv[1]), (long)atoi(argv[1]));
        printf("strtol: 0x%16lx (%ld)\n", (long)strtol(argv[1], NULL, 0), (long)strtol(argv[1], NULL, 0));
        return 0;
  }

  $ ./a.out 2147483647
  atoi:   0x        7fffffff (2147483647)
  strtol: 0x        7fffffff (2147483647)

  $  ./a.out 2147483648
  atoi:   0xffffffff80000000 (-2147483648)
  strtol: 0x        80000000 (2147483648)

  $ ./a.out 4294967296
  atoi:   0x               0 (0)
  strtol: 0x       100000000 (4294967296)

  $ ./a.out 00003333
  atoi:   0x             d05 (3333)
  strtol: 0x             6db (1755)

That's why I think that we must really take care of 32-bit. Signed 32-bits
(hence unsigned 31 bits) are the only really portable integers. Most code
works pretty well above this, but naive implementations easily get caught
like this.

In fact I would probably be fine with unbounded integers if we :
  - explain how to safely parse an integer (and detect its correctness and
    validity for the type being used)
  - mention a warning about the risk of widespread code not capable of
    dealing with quantities larger than 2147483647.

(...)
> No, floats are not ints and cannot reliably be converted into them. They are
> a lossy approximation format.  For example you cannot put key BIGINTs into
> floats.

I agree, I really don't like floats to represent integers for discrete
values. Byte offsets and content-lengths are discrete values and must
absolutely be exact. Parsing "Content-length: 12345.5555" will lead to
funny things depending on the implementations. Some truncating to 12345
and others rounding up to 12346. It's even worse when such numbers have
to be converted to other quantities (eg: bits), because 12345.5555*8=
98764.4440 which differs from 12345*8=98760 possibly resulting in negative
values in subtracts (eg: remaining space in a buffer).

Willy

Received on Saturday, 4 November 2017 07:18:23 UTC