- From: Kazuho Oku <kazuhooku@gmail.com>
- Date: Wed, 1 Nov 2017 10:52:53 +0900
- To: Poul-Henning Kamp <phk@phk.freebsd.dk>
- Cc: Willy Tarreau <w@1wt.eu>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Hi, PHK! Thank you for your response. 2017-10-31 17:26 GMT+09:00 Poul-Henning Kamp <phk@phk.freebsd.dk>: > -------- > In message <CANatvzzOXL6vFjm_4KBxLvwosZ6vJYW_ic34_KCwFXtFXFTLsQ@mail.gmail.com> > , Kazuho Oku writes: > >>So why not mandate support for 64-bit integers? >> >>[...] >> >>Let's not repeat the failure made by JSON. > > If we were designing a general-purpose data-carrier format, I would > be 100% with you there, but we are not. > > The goal here is to design a maximally robust data-carrier format, > and that means conservative choices and putting the inconvenience > on the end which packages the data. In my view, current limit (15 digits at max.) is overly conservative. Let me explain in response to your text below. > The number format is intended for sending quantities on which > arithmetic makes sense, and the point of the restriction is > to reserve to the implementor the ability to use the most > efficient hardware native data type, without loss of precision. > > 15 digits is 49¾ bits, and while I'm not prepared to state > that "is enough for everybody" I think we can safely say that > it covers all uses of arithmetic seen in HTTP until now. IMO, we should consider the future instead of optimizing against what we see now. When Content-Length was defined in HTTP/1.0 back in May 1996, the largest file we used to transfer were CD-ROM images (650MB ~ 1GB ~ 2^30 bytes). We are now after 20 years since that, and we are seeing SD cards of 512GB (~ 1TB ~ 2^40 bytes). Assuming that the increase will continue, we would be seeing a storage that can store 2^50 bytes (~1 PB) of data within 20 years. How long is the expected lifetime of Structured Headers? Assuming that it would be used for 20 years (HTTP has been used for 20+ years, TCP is used for 40+ years), there is fair chance that the 49¾ bits limit is too small. Note that even if we switch to transferring headers in binary-encoded forms, we might continue using Structured Headers for textual representation. Do we want to risk making _all_ our future implementations complex in exchange of being friendly to _some_ programming languages without 64-bit integers? Considering the fact that filesystems are raising their maximum filesize to exabyte range, those programming languages will anyways be facing issues dealing with large files. The other thing I would like to point out is that mandating support for 64-bit integer fields does not necessary mean that you cannot easily represent such kind of fields when using the programming languages without 64-bit integers. This is because there is no need to store an integer field using integers. Decoders of Structured Headers can retain the representation as a string (i.e. series of digits), and applications can convert them to numbers when they want to use the value for calculation. Since the size of the files transmitted today do not exceed 1PB, such approach will not have any issues today. As they start handling files larger than 1PB, they will figure out how to support 64-bit integers anyways. Otherwise they cannot access the file! Considering that, I would argue that we are unlikely to see issues in the future as well, with programming languages that do not support 64-bit integers _now_. To summarize, 49¾ bits limit is scary considering the expected lifetime of a standard, and we can expect programming languages that do not support 64-bit integers to start supporting them as we start using files of petabyte size. > > If your 64 bit number is an identifier, the only valid operation > on it is "check for identity", and taking the detour over a decimal > representation is not only uncalled for, but also very inefficient > in terms of CPU cycles. > > The natural and most efficient format for such an identifier would > be base64 binary, but if for some reason it has to be decimal, say > convenience for human debuggers, one could prefix it with a "i" and > send it as a label. Requiring the use of base64 goes against the merit of using a textual representation. The reason we use textual representation is because it is easy for us to read and use. On most systems, 64-bit IDs are represented as numbers. So people would want to transmit them in the same representation over HTTP as well. So to me it seems that it is whether we want 64-bit integers to be sent as numbers of strings (or labels). That is the reason why I only compared the two options in my previous mail. In this respect, another issue we should consider is that we can more effectively compress the data if we know that it is a number (comparing to compressing it as a text or a label). > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. -- Kazuho Oku
Received on Wednesday, 1 November 2017 01:53:18 UTC