Re: New Version Notification for draft-nottingham-structured-headers-00.txt from Kazuho Oku on 2017-11-01 (ietf-http-wg@w3.org from October to December 2017)

From: Kazuho Oku <kazuhooku@gmail.com>
Date: Wed, 1 Nov 2017 10:52:53 +0900
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
Cc: Willy Tarreau <w@1wt.eu>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CANatvzxeMYWNtpfG5y-3OQ4uqkVFrLiDbxs900fuwmL4cdAjqQ@mail.gmail.com>
Hi, PHK!

Thank you for your response.

2017-10-31 17:26 GMT+09:00 Poul-Henning Kamp <phk@phk.freebsd.dk>:
> --------
> In message <CANatvzzOXL6vFjm_4KBxLvwosZ6vJYW_ic34_KCwFXtFXFTLsQ@mail.gmail.com>
> , Kazuho Oku writes:
>
>>So why not mandate support for 64-bit integers?
>>
>>[...]
>>
>>Let's not repeat the failure made by JSON.
>
> If we were designing a general-purpose data-carrier format, I would
> be 100% with you there, but we are not.
>
> The goal here is to design a maximally robust data-carrier format,
> and that means conservative choices and putting the inconvenience
> on the end which packages the data.

In my view, current limit (15 digits at max.) is overly conservative.

Let me explain in response to your text below.

> The number format is intended for sending quantities on which
> arithmetic makes sense, and the point of the restriction is
> to reserve to the implementor the ability to use the most
> efficient hardware native data type, without loss of precision.
>
> 15 digits is 49¾ bits, and while I'm not prepared to state
> that "is enough for everybody" I think we can safely say that
> it covers all uses of arithmetic seen in HTTP until now.

IMO, we should consider the future instead of optimizing against what
we see now.

When Content-Length was defined in HTTP/1.0 back in May 1996, the
largest file we used to transfer were CD-ROM images (650MB ~ 1GB ~
2^30 bytes). We are now after 20 years since that, and we are seeing
SD cards of 512GB (~ 1TB ~ 2^40 bytes). Assuming that the increase
will continue, we would be seeing a storage that can store 2^50 bytes
(~1 PB) of data within 20 years.

How long is the expected lifetime of Structured Headers? Assuming that
it would be used for 20 years (HTTP has been used for 20+ years, TCP
is used for 40+ years), there is fair chance that the 49¾ bits limit
is too small. Note that even if we switch to transferring headers in
binary-encoded forms, we might continue using Structured Headers for
textual representation.

Do we want to risk making _all_ our future implementations complex in
exchange of being friendly to _some_ programming languages without
64-bit integers?

Considering the fact that filesystems are raising their maximum
filesize to exabyte range, those programming languages will anyways be
facing issues dealing with large files.

The other thing I would like to point out is that mandating support
for 64-bit integer fields does not necessary mean that you cannot
easily represent such kind of fields when using the programming
languages without 64-bit integers.

This is because there is no need to store an integer field using
integers. Decoders of Structured Headers can retain the representation
as a string (i.e. series of digits), and applications can convert them
to numbers when they want to use the value for calculation.

Since the size of the files transmitted today do not exceed 1PB, such
approach will not have any issues today. As they start handling files
larger than 1PB, they will figure out how to support 64-bit integers
anyways. Otherwise they cannot access the file! Considering that, I
would argue that we are unlikely to see issues in the future as well,
with programming languages that do not support 64-bit integers _now_.

To summarize, 49¾ bits limit is scary considering the expected
lifetime of a standard, and we can expect programming languages that
do not support 64-bit integers to start supporting them as we start
using files of petabyte size.

>
> If your 64 bit number is an identifier, the only valid operation
> on it is "check for identity", and taking the detour over a decimal
> representation is not only uncalled for, but also very inefficient
> in terms of CPU cycles.
>
> The natural and most efficient format for such an identifier would
> be base64 binary, but if for some reason it has to be decimal, say
> convenience for human debuggers, one could prefix it with a "i" and
> send it as a label.

Requiring the use of base64 goes against the merit of using a textual
representation. The reason we use textual representation is because it
is easy for us to read and use. On most systems, 64-bit IDs are
represented as numbers. So people would want to transmit them in the
same representation over HTTP as well. So to me it seems that it is
whether we want 64-bit integers to be sent as numbers of strings (or
labels). That is the reason why I only compared the two options in my
previous mail.

In this respect, another issue we should consider is that we can more
effectively compress the data if we know that it is a number
(comparing to compressing it as a text or a label).

>
> --
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.



-- 
Kazuho Oku
Received on Wednesday, 1 November 2017 01:53:18 UTC