- From: Matthew Kerwin <matthew@kerwin.net.au>
- Date: Sat, 4 Nov 2017 19:21:08 +1000
- To: Andy Green <andy@warmcat.com>
- Cc: Willy Tarreau <w@1wt.eu>, Mark Nottingham <mnot@mnot.net>, Kazuho Oku <kazuhooku@gmail.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CACweHNBsgT6oX+Vxk6dX4Yn6TAG7K+jDvuxP2N6PfbvY8t5aUw@mail.gmail.com>
On 4 November 2017 at 17:59, Andy Green <andy@warmcat.com> wrote: > > On 11/04/2017 03:17 PM, Willy Tarreau wrote: > >> >> No, floats are not ints and cannot reliably be converted into them. They >>> are >>> a lossy approximation format. For example you cannot put key BIGINTs >>> into >>> floats. >>> >> >> I agree, I really don't like floats to represent integers for discrete >> values. Byte offsets and content-lengths are discrete values and must >> absolutely be exact. Parsing "Content-length: 12345.5555" will lead to >> funny things depending on the implementations. Some truncating to 12345 >> and others rounding up to 12346. It's even worse when such numbers have >> to be converted to other quantities (eg: bits), because 12345.5555*8= >> 98764.4440 which differs from 12345*8=98760 possibly resulting in negative >> values in subtracts (eg: remaining space in a buffer). >> > > Yeah we completely agree on that, floats won't do for generic ints. So, how about if I write this without using the word 'float' or 'int', which carry specific connotations, and instead use 'number' to mean "a rational number, on which arithmetic operations can be performed." The set of rational numbers can quite easily be separated into those with a denominator of 1 (integers), and those with anything else (which I've been calling "fractional numbers"). The integers and fractional numbers have different conventions for being represented in ASCII sequences, and in this particular sub-thread we're only discussing the representation (and thus parsing) of integers, and how that is influenced by different environments' abilities to operate on the values resulting from said parsing. Yeah? There is a pretty old ~(20 years), pretty stable (at least in respect to its handling of numbers), pretty popular programming language out there that states in its spec [1]: "...all the positive and negative integers whose magnitude is no greater than 2^53 are representable in the Number type..." and: "Some [...] operators deal only with integers in specific ranges such as -2^31 through 2^31-1, inclusive, or in the range 0 through 2^16-1, inclusive. These operators accept any value of the Number type but first convert each such value to an integer value in the expected range. " Reading the spec, notable examples of such operations are those related to indexing, such as into arrays, and bitwise arithmetic operations (AND, OR, NOT, SHIFT, etc.); in both cases values are limited to the range -2^31 to 2^31-1. So in this language there is a specified hard upper limit on integer values that can be handled natively (53 bits), and a softer but well-understood practical limit on numbers that can be used in many common operations (32 bits). Back to the question of parsing: if we define a limit for the number of acceptable digits, it's a pretty strong judgement call propose a limit that produces values that some languages can support natively, on the grounds that they can support it natively, while ignoring such a popular language that can't. 15 decimal digits (<50 bits) gets us the support of that language, and covers a lot of use cases *. 9 decimal digits (<30 bits) would get us close to universal support, but maybe be less useful. Whichever concrete maximum length is chosen, I think it's good to pick one that works for that popular language (as well as all the others), but I don't think it's right to call the parsing rule something as generic as "number". I wouldn't mind if it was 15 digits with a name of "common integer" or something, with the view that we might, in future, find a need to define a 38-digit "large integer" rule, or something like that. * people keep talking about size of resources, but Content-Length and Range are already defined and it'd probably be a while before they were respecced to use this common header format. Cheers [1]: http://www.ecma-international.org/ecma-262/8.0/index.html#sec-ecmascript-language-types-number-type -- Matthew Kerwin http://matthew.kerwin.net.au/
Received on Saturday, 4 November 2017 09:21:33 UTC