Re: HTTP/2 Header Encoding Status Update from Nicolas Mailhot on 2013-03-02 (ietf-http-wg@w3.org from January to March 2013)

From: Nicolas Mailhot <nicolas.mailhot@laposte.net>
Date: Sat, 2 Mar 2013 10:34:21 +0100
To: "James M Snell" <jasnell@gmail.com>
Cc: "Nicolas Mailhot" <nicolas.mailhot@laposte.net>, ietf-http-wg@w3.org
Message-ID: <61cfe44be424129036f6bf68354e9c89.squirrel@arekh.dyndns.org>

Le Ven 1 mars 2013 16:37, James M Snell a écrit :
> On Mar 1, 2013 2:44 AM, "Nicolas Mailhot" <nicolas.mailhot@laposte.net>
> wrote:
>>
>> Mark Nottingham <mnot@...> writes:
>>
>> >
>> > One other thing -
>> >
>> > On 28/02/2013, at 8:16 AM, James M Snell <jasnell@...> wrote:
>> >
>> > > Date values:
>> > >
>> > >  1. Dates are encoded as the number of seconds since a new epoch
>> > > (Midnight GMT, Jan 1 1990)
>> >
>> > So, how many bytes does changing the epoch save us?
>> >
>> > I just get concerned about putting little landmines like this in...
>>
>> I'd really *love* to see the whole epoch concept nuked in HTTP/2 and
>> force
>> everyone to use an ISO 8601 profile like html instead
>> http://www.w3.org/TR/NOTE-datetime
>> (requiring UTC-only if necessary)
>>
>> We've already been bitten by an http product that seemed to use the same
> epochs
>> as everyone else (even calling them "unix epochs") but actually counted
> epochs
>> in local dos/windows time. The problem was un-obvious till we needed to
> perform
>> some cross-system analysis and discovered they disagreed on time even
> though
>> they all used the "same" epoch format. Getting time right in an i18n
> context is
>> hard enough without obfuscating the format.
>>
>> Do the benefits of an epoch really outweigh the benefits of avoiding
>> time
>> mistakes in an http user?
>>
>
> Iso8601 date times require a minimum of 20 ascii bytes at the second
> precision level.  The uvarint epoch encoding requires a maximum of ten
> bytes but will usually be around 5 or 6 for the reasonably foreseeable
> future.  I have no problem with 8601 for general use but in encoding for
> transmission, the epoch encoding is much more efficient.

Sure, I never claimed it was more efficient. I don't care how header names
and other technical-only parts are encoded: go wild, be as efficient as
you want. However, IMHO this group should think twice before choosing
anything un-obvious to encode human-related info. So +10 for killing weird
text encodings and putting everything in UTF-8, -10 for using epochs for
time. Because time is based on human conventions, that we often do not get
right, so adding an obfuscation layer does not help (rockets have been
lost in the past because humans could not agree on units, there is the
ongoing trainwreck of KB/KiB and friends, anything unit-related must be
treated with extra care).

While a bit inefficient (in space) iso 8601 should not be too inefficient
at processing time. Do the space savings added by using a full epoch
outweigh the problems caused by wrong datetimes human operators do not
notice because they're hidden in an epoch, that looks like any other
epoch?

One of the main properties of the time format chosen is that it should
fail hard and fast when any network node get confused about DST or
country. Because the cleanup when such mistakes go unnoticed is expensive,
and they are very common.

Iso 8601 is the best we have right now, many smart humans spent a lot of
time defining a representation that was no error-prone yet reasonably easy
to process.

This proposal is very bad from this point of view: not only it uses an
epoch format which has already been misused in the wild, but it chooses a
reference start different from past formats. I find it very dangerous, it
will induce lots of mistakes.

Regards,

-- 
Nicolas Mailhot

Received on Saturday, 2 March 2013 09:34:53 UTC