Re: New Version Notification for draft-nottingham-structured-headers-00.txt from Matthew Kerwin on 2017-11-04 (ietf-http-wg@w3.org from October to December 2017)

From: Matthew Kerwin <matthew@kerwin.net.au>
Date: Sat, 4 Nov 2017 19:21:08 +1000
To: Andy Green <andy@warmcat.com>
Cc: Willy Tarreau <w@1wt.eu>, Mark Nottingham <mnot@mnot.net>, Kazuho Oku <kazuhooku@gmail.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CACweHNBsgT6oX+Vxk6dX4Yn6TAG7K+jDvuxP2N6PfbvY8t5aUw@mail.gmail.com>

On 4 November 2017 at 17:59, Andy Green <andy@warmcat.com> wrote:

>
> On 11/04/2017 03:17 PM, Willy Tarreau wrote:
>
>>
>> No, floats are not ints and cannot reliably be converted into them. They
>>> are
>>> a lossy approximation format.  For example you cannot put key BIGINTs
>>> into
>>> floats.
>>>
>>
>> I agree, I really don't like floats to represent integers for discrete
>> values. Byte offsets and content-lengths are discrete values and must
>> absolutely be exact. Parsing "Content-length: 12345.5555" will lead to
>> funny things depending on the implementations. Some truncating to 12345
>> and others rounding up to 12346. It's even worse when such numbers have
>> to be converted to other quantities (eg: bits), because 12345.5555*8=
>> 98764.4440 which differs from 12345*8=98760 possibly resulting in negative
>> values in subtracts (eg: remaining space in a buffer).
>>
>
> Yeah we completely agree on that, floats won't do for generic ints.

So, how about if I write this without using the word 'float' or 'int',
which carry specific connotations, and instead use 'number' to mean "a
rational number, on which arithmetic operations can be performed."  The set
of rational numbers can quite easily be separated into those with a
denominator of 1 (integers), and those with anything else (which I've been
calling "fractional numbers").  The integers and fractional numbers have
different conventions for being represented in ASCII sequences, and in this
particular sub-thread we're only discussing the representation (and thus
parsing) of integers, and how that is influenced by different environments'
abilities to operate on the values resulting from said parsing.  Yeah?

There is a pretty old ~(20 years), pretty stable (at least in respect to
its handling of numbers), pretty popular programming language out there
that states in its spec [1]:

    "...all the positive and negative integers whose magnitude is no
greater than 2^53 are representable in the Number type..."

and:

    "Some [...] operators deal only with integers in specific ranges such
as -2^31 through 2^31-1, inclusive, or in the range 0 through 2^16-1,
inclusive. These operators accept any value of the Number type but first
convert each such value to an integer value in the expected range. "

Reading the spec, notable examples of such operations are those related to
indexing, such as into arrays, and bitwise arithmetic operations (AND, OR,
NOT, SHIFT, etc.); in both cases values are limited to the range -2^31 to
2^31-1.

So in this language there is a specified hard upper limit on integer values
that can be handled natively (53 bits), and a softer but well-understood
practical limit on numbers that can be used in many common operations (32
bits).

Back to the question of parsing: if we define a limit for the number of
acceptable digits, it's a pretty strong judgement call propose a limit that
produces values that some languages can support natively, on the grounds
that they can support it natively, while ignoring such a popular language
that can't.

15 decimal digits (<50 bits) gets us the support of that language, and
covers a lot of use cases *.  9 decimal digits (<30 bits) would get us
close to universal support, but maybe be less useful.  Whichever concrete
maximum length is chosen, I think it's good to pick one that works for that
popular language (as well as all the others), but I don't think it's right
to call the parsing rule something as generic as "number".  I wouldn't mind
if it was 15 digits with a name of "common integer" or something, with the
view that we might, in future, find a need to define a 38-digit "large
integer" rule, or something like that.

* people keep talking about size of resources, but Content-Length and Range
are already defined and it'd probably be a while before they were respecced
to use this common header format.

Cheers

[1]:
http://www.ecma-international.org/ecma-262/8.0/index.html#sec-ecmascript-language-types-number-type

-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/

Received on Saturday, 4 November 2017 09:21:33 UTC