Re: Precision of numbers using JSON Header Field Values

On Sat, Jul 16, 2016 at 10:01:05AM +0000, Poul-Henning Kamp wrote:
> --------
> In message <20160716051253.GD21568@1wt.eu>, Willy Tarreau writes:
> 
> >I must say that right now I'm worried with the lack of distinction between
> >an integer and a float. Some fields like a content-length or a byte-range
> >definitely require a strict integer.
> 
> [1]
> 
> Well, that is true, but *in principle* there is nothing wrong with
> transmitting it as "XXXXXX.000000" or even "1.2345e4", as long as
> the receiver checks that the result is an integer.

Then we'll have to suggest how to do it safely. Something like the
following depending on the language and expected ranges and which
however does not cover the lack of precision if bits are missing
in the mantissa to properly represent the int :

     if ((double)value != (double)(uint64_t)value)
          fail();

Eg, a few tests :

  input=1.99999994 float=2.000000 int=1 equal?=0
  input=1111111111999999 float=1111111111999999.000000 int=1111111111999999 equal?=1
  input=11111111119999999 float=11111111120000000.000000 int=11111111120000000 equal?=1
  input=111111111199999991 float=111111111199999984.000000 int=111111111199999984 equal?=1

> And it's not like that is an issue JSON creates, we already perform
> that very check today, but it is bundled up in a much stronger and
> general check:  The field can only contain digits.
> 
> The trouble is the "we expect people to use generic JSON
> parsers" assumption being tacitly made here.

Absolutely. The difference here is that users totally trust the libs
they're relying on without knowing their limits, and generally have
no easy access to the original contents to verify its format.

> Some fraction of generic JSON parsers store numbers as numbers in
> their internal data structure[2].
> 
> If you use one of those, an hostile sender can force you to do
> floating point math on any and all numeric fields.

That's my fear.

> We can of course write "If you use a generic JSON parser, make sure
> it doesn't store numbers as numbers" but that makes the word "generic"
> surplusage, doesn't it ?

We know we can very easily cause users to shoot themselves in the foot
by lack of knowledge about corner cases, so instead of saying "don't
use this or that", we need instead to help them verify that the
underlying framework didn't lie to them. That makes their choice of
components easier and safer (especially in large teams where implementors
do not chose the lower layer components which have been imposed by others).

> We need to think about this problem at the meta level:
> 
> 1) How are the content of these "structured headers" described in RFCs?
> 
> 2) How does that description get turned into code which validates
>    an input candidate ?

or 3) do we want to limit authorized values to the subset known to
safely pass through various implementations ? That's about similar to
the analysis work we had to do for 7230 after all. For example, stating
that any integer value cannot be larger than (2^53)-1 (about 9.10^15)
could be fine. It means that a content-length or byte-ranges larger
than 8 PiB will have to be rejected.

> I belive answering that with "1: English Prose, 2: Programmers do
> that" will cause far too much security vulnerabilities and interop
> problems[3]
> 
> We _have_ to find a better answer than that.
> 
> Once we have I suspect the on-wire format will follow easily from that.

I think that's exactly what Julian has been doing over the last years
and what led him to conclude that JSON was not a bad idea in the end.
It might appear as a bad solution but the least bad of all other ones.

But if we start to define some small variations or impose restrictions
on how JSON parsers have to work, maybe we'll take less time to define
exactly what we want to see there and how it should behave regarding
some corner case behaviours (eg: how to handle duplicate keys). My
guess is that it will look very similar to JSON with possibly very
small variations and that modifying a few JSON implementations to
respect this format would not be that hard.

Regards,
Willy

Received on Sunday, 17 July 2016 05:41:26 UTC