Re: HTTP/2 Header Encoding Status Update from Yoav Nir on 2013-03-02 (ietf-http-wg@w3.org from January to March 2013)

From: Yoav Nir <ynir@checkpoint.com>
Date: Sat, 2 Mar 2013 10:54:09 +0000
To: Nicolas Mailhot <nicolas.mailhot@laposte.net>
CC: James M Snell <jasnell@gmail.com>, "<ietf-http-wg@w3.org>" <ietf-http-wg@w3.org>
Message-ID: <6DB9F6DB-E351-46F0-B694-89EA33C15246@checkpoint.com>
On Mar 2, 2013, at 11:34 AM, Nicolas Mailhot <nicolas.mailhot@laposte.net> wrote:

> 
> Le Ven 1 mars 2013 16:37, James M Snell a écrit :
>> On Mar 1, 2013 2:44 AM, "Nicolas Mailhot" <nicolas.mailhot@laposte.net>
>> wrote:
>>> 
>>> Mark Nottingham <mnot@...> writes:
>>> 
>>>> 
>>>> One other thing -
>>>> 
>>>> On 28/02/2013, at 8:16 AM, James M Snell <jasnell@...> wrote:
>>>> 
>>>>> Date values:
>>>>> 
>>>>> 1. Dates are encoded as the number of seconds since a new epoch
>>>>> (Midnight GMT, Jan 1 1990)
>>>> 
>>>> So, how many bytes does changing the epoch save us?
>>>> 
>>>> I just get concerned about putting little landmines like this in...
>>> 
>>> I'd really *love* to see the whole epoch concept nuked in HTTP/2 and
>>> force
>>> everyone to use an ISO 8601 profile like html instead
>>> http://www.w3.org/TR/NOTE-datetime
>>> (requiring UTC-only if necessary)
>>> 
>>> We've already been bitten by an http product that seemed to use the same
>> epochs
>>> as everyone else (even calling them "unix epochs") but actually counted
>> epochs
>>> in local dos/windows time. The problem was un-obvious till we needed to
>> perform
>>> some cross-system analysis and discovered they disagreed on time even
>> though
>>> they all used the "same" epoch format. Getting time right in an i18n
>> context is
>>> hard enough without obfuscating the format.
>>> 
>>> Do the benefits of an epoch really outweigh the benefits of avoiding
>>> time
>>> mistakes in an http user?
>>> 
>> 
>> Iso8601 date times require a minimum of 20 ascii bytes at the second
>> precision level.  The uvarint epoch encoding requires a maximum of ten
>> bytes but will usually be around 5 or 6 for the reasonably foreseeable
>> future.  I have no problem with 8601 for general use but in encoding for
>> transmission, the epoch encoding is much more efficient.
> 
> Sure, I never claimed it was more efficient. I don't care how header names
> and other technical-only parts are encoded: go wild, be as efficient as
> you want. However, IMHO this group should think twice before choosing
> anything un-obvious to encode human-related info. So +10 for killing weird
> text encodings and putting everything in UTF-8, -10 for using epochs for
> time. Because time is based on human conventions, that we often do not get
> right, so adding an obfuscation layer does not help (rockets have been
> lost in the past because humans could not agree on units, there is the
> ongoing trainwreck of KB/KiB and friends, anything unit-related must be
> treated with extra care).

Machines store date & time information in internal formats. Whatever bits-on-the-wire format we use, machines will convert their internal format to the protocol format on sending, and convert to their internal format when receiving. I see no reason why we should use on the wire a format that is readable by humans, when the communications is between machines. The ISO 8601 format takes more cycles to encoded and decode, and is much harder for calculating time differences. The exact length of any particular February has been a source of many errors in the past.

> While a bit inefficient (in space) iso 8601 should not be too inefficient
> at processing time. Do the space savings added by using a full epoch
> outweigh the problems caused by wrong datetimes human operators do not
> notice because they're hidden in an epoch, that looks like any other
> epoch?

Operators need never look at the bits on the wire. Since we've already agreed to go to some kind of binary format, the days of eyeballing packet captures and using telnet to test web servers are gone anyway. You'll need a packet analyzer such as Wireshark. Wireshark will show you the date in a human readable format, just as it shows an interpretation of other fields. 

> One of the main properties of the time format chosen is that it should
> fail hard and fast when any network node get confused about DST or
> country. Because the cleanup when such mistakes go unnoticed is expensive,
> and they are very common.

So you'd like endpoints to break a connection and log (or pop up) an error if the other side shows a time more than X minutes in the future? That's possible with any format.

> Iso 8601 is the best we have right now, many smart humans spent a lot of
> time defining a representation that was no error-prone yet reasonably easy
> to process.

That representation was created for the benefit of humans. It would be fine, if somewhat geeky to publish the upcoming meeting date as 2013-03-15T13:00Z. But that is because this is meant for humans. It also makes sense for protocols and formats that are intentionally human readable, such as HTTP/1 and XML. It's just a burden for machines.

> This proposal is very bad from this point of view: not only it uses an
> epoch format which has already been misused in the wild, but it chooses a
> reference start different from past formats. I find it very dangerous, it
> will induce lots of mistakes.
> 
> Regards,
> 
> -- 
> Nicolas Mailhot
>
Received on Saturday, 2 March 2013 10:54:45 UTC