Re: New Version Notification for draft-nottingham-structured-headers-00.txt from Willy Tarreau on 2017-11-02 (ietf-http-wg@w3.org from October to December 2017)

From: Willy Tarreau <w@1wt.eu>
Date: Thu, 2 Nov 2017 07:06:00 +0100
To: Kazuho Oku <kazuhooku@gmail.com>
Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20171102060600.GB5678@1wt.eu>

On Thu, Nov 02, 2017 at 10:05:31AM +0900, Kazuho Oku wrote:
> What I am arguing is that we should allow applications to send all
> integers (especially all of those that fit into 64-bit) using series
> of digits, rather than requiring use of strings, labels, or base64 for
> storing them.
> 
> In my view, an application would not be enforced to handle every
> number represented by a 64-bit number even if Structured Headers
> defines handling of 64-bit numbers as a minimal requirement. For
> example, you cannot download a file of 1EiB size, unless your
> filesystem supports storing such large files.

... or you're forwarding it because you're an intermediary :-)

> Assuming that we would not be actually using all the numbers sent
> using Structured Headers, it makes sense to delay converting them to
> internal numeric representation (i.e. `int64_t`) until it becomes
> necessary. In fact, many of us already have such kind of optimization.
> For example, many of the HTTP clients keep the Last-Modified header in
> string received as-is, since it is seldom required to make
> calculations using the value. It is wise to keep them as strings (and
> send them as part of the If-Modified-Sence header). What I am
> suggesting is that the fields of Structured Headers can be handled the
> same way.

That's a good point. In haproxy the only integers we always parse are
the content-length (which requires a lot of care including for duplicate
values etc), and the chunk sizes. The rest is optional. I suspect that
caches like Varnish and Squid have to deal a lot with dates. I'd still
be tempted to consider the difference between a "quantity" and an
identifier but as I mentionned in another e-mail, ASCII already provides
a variable length encoding which is moderately efficient for quantities
(hex in chunks is a bit better though). Probably that we'll find that
in the end, we can have "numbers" (in ASCII format) for everything, but
that we need a special case for file sizes and offsets (content-length,
ranges, chunk sizes) and a special case for dates.

> Note that such optimization might not make sense for hyperscalar CPUs,
> since they could validate the numeric representation (i.e. check if
> the characters are digits) at the same time convert them to an
> integral type. But still, it could be a good optimization for embedded
> devices with less-complicated CPUs that we are trying to take care of
> in this thread.

It's nice to consider this but it's also important to consider that most
implementers use high-level languages and consider such stuff useless.
How many times I've heard that I was playing with micro-optimizations
while in fact I was trying to protect against trivial DoS situations!

Thus I'd suggest to design with ease of optimization in mind but with
ease of *safe* implementation first.

Willy

Received on Thursday, 2 November 2017 06:06:31 UTC