IETF LC comments on draft-ietf-httpbis-header-compression-10 from Julian Reschke on 2015-01-10 (ietf-http-wg@w3.org from January to March 2015)

From: Julian Reschke <julian.reschke@greenbytes.de>
Date: Sat, 10 Jan 2015 17:58:46 +0100
To: ietf@ietf.org
CC: ietf-http-wg@w3.org
Message-ID: <54B15A46.8040604@greenbytes.de>
Hi there,

these comments are mostly editorial; the main exception being the 
contents of the predefined static table.

Best regards, Julian


1.  Introduction

    In HTTP/1.1 (see [RFC7230]), header fields are not compressed.  As
    Web pages have grown to include dozens to hundreds of requests, the

s/include/cause/ (the page doesn't really include requests)

1.1.  Overview

    The format defined in this specification treats a list of header
    fields as an ordered collection of name-value pairs that can include
    duplicates.  Names and values are considered to be opaque sequences
    of octets, and the order of header fields is preserved after being
    compressed and decompressed.

When we say duplicate here, it's about duplicate *names*, right?


2.1.  Header List Ordering

    HPACK preserves the ordering of header fields inside the header list.
    An encoder MUST order header field representations in the header
    block according to their ordering in the original header list.  A
    decoder MUST order header fields in the decoded header list according
    to their ordering in the header block.

Just before we said that we do not define the encoder. Maybe reprase the 
first MUST as a statement of fact.

2.3.2.  Dynamic Table

    The dynamic table can contain duplicate entries.  Therefore,
    duplicate entries MUST NOT be treated as an error by a decoder.

What's the definition of "duplicate" here? Name and value match?

3.2.  Header Field Representation Processing

    The processing of a header block to obtain a header list is defined
    in this section.  To ensure that the decoding will successfully
    produce a header list, a decoder MUST obey the following rules.

"This section defines..."

4.1.  Calculating Table Size

    The size of the dynamic table is the sum of the size of its entries.

    The size of an entry is the sum of its name's length in octets (as
    defined in Section 5.2), its value's length in octets (see
    Section 5.2), plus 32.

Remove one of the references to Section 5.2.

    The size of an entry is calculated using the length of the name and
    value without any Huffman encoding applied.

This is the first time that Huffman is mentioned; it would probably be 
good to mention it earlier on.

    NOTE: The additional 32 octets account for the overhead associated
    with an entry.  For example, an entry structure using two 64-bit
    pointers to reference the name and the value of the entry, and two
    64-bit integers for counting the number of references to the name and
    value would have 32 octets of overhead.

maybe say "estimated overhead"?

5.1.  Integer Representation

    Integers are used to represent name indexes, pair indexes or string
    lengths.  To allow for optimized processing, an integer
    representation always finishes at the end of an octet.

First use of "pair index"; at this point it's not clear what that might 
be.

    Pseudo-code to represent an integer I is as follows:

    if I < 2^N - 1, encode I on N bits
    else
        encode (2^N - 1) on N bits
        I = I - (2^N - 1)
        while I >= 128
             encode (I % 128 + 128) on 8 bits
             I = I / 128
        encode I on 8 bits

    Pseudo-code to decode an integer I is as follows:

    decode I from the next N bits
    if I < 2^N - 1, return I
    else
        M = 0
        repeat
            B = next octet
            I = I + (B & 127) * 2^M
            M = M + 7
        while B & 128 == 128
        return I

We have bad experience with making indentation in artwork significant; 
maybe introducing brackets would be a good idea.

    This integer representation allows for values of indefinite size.  It
    is also possible for an encoder to send a large number of zero
    values, which can waste octets and could be used to overflow integer
    values.  Excessively large integer encodings - in value or octet
    length - MUST be treated as a decoding error.  Different limits can
    be set for each of the different uses of integers, based on
    implementation constraints.

Having a MUST here when we don't say what "excessive" is seems like a 
bad idea.


5.2.  String Literal Representation

    Header field names and header field values can be represented as
    literal string.  A literal string is encoded as a sequence of octets,
    either by directly encoding the literal string's octets, or by using
    a Huffman code (see [HUFFMAN]).

      0   1   2   3   4   5   6   7
    +---+---+---+---+---+---+---+---+
    | H |    String Length (7+)     |
    +---+---------------------------+
    |  String Data (Length octets)  |
    +-------------------------------+

                   Figure 4: String Literal Representation

    A literal string representation contains the following fields:

    H: A one bit flag, H, indicating whether or not the octets of the
       string are Huffman encoded.

    String Length:  The number of octets used to encode the string
       literal, encoded as an integer with 7-bit prefix (see
       Section 5.1).

    String Data:  The encoded data of the string literal.  If H is '0',
       then the encoded data is the raw octets of the string literal.  If
       H is '1', then the encoded data is the Huffman encoding of the
       string literal.

The figure could be interpreted as if the string length is always 
expressed in 7 bits, which I believe is not true.


6.2.1.  Literal Header Field with Incremental Indexing

    A literal header field with incremental indexing representation
    results in appending a header field to the decoded header list and
    inserting it as a new entry into the dynamic table.

      0   1   2   3   4   5   6   7
    +---+---+---+---+---+---+---+---+
    | 0 | 1 |      Index (6+)       |
    +---+---+-----------------------+
    | H |     Value Length (7+)     |
    +---+---------------------------+
    | Value String (Length octets)  |
    +-------------------------------+

     Figure 6: Literal Header Field with Incremental Indexing - Indexed
                                    Name

See above. One can read this as "H" being bit 0 in the second octet; but 
that's not always the case. I think it would be good to revise the 
figure format.


6.3.  Dynamic Table Size Update

    The new maximum size MUST be lower than or equal to the last value of
    the SETTINGS_HEADER_TABLE_SIZE parameter (see Section 6.5.2 of
    [HTTP2]) received from the decoder and acknowledged by the encoder
    (see Section 6.5.3 of [HTTP2]).

Some parts of the spec read as if HPACK doesn't have direct dependency 
on HTTP/2 or HTTP in general. However, here a normative dependency is 
introduced.

7.1.  Probing Dynamic Table State

    This is possible even over TLS, because while TLS provides
    confidentiality protection for content, it only provides a limited
    amount of protection for the length of that content.

Expand "TLS" on first use, and add a reference.

9.  References

9.1.  Normative References

    [HTTP2]      Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
                 Transfer Protocol version 2",
                 draft-ietf-httpbis-http2-16 (work in progress),
                 October 2014.

November (already fixed in Git).

9.2.  Informative References

    [CRIME]      Rizzo, J. and T. Duong, "The CRIME Attack",
                 September 2012, <https://docs.google.com/a/twist.com/
                 presentation/d/
                 11eBmGiHbYcHR9gL5nDyZChu_-lCa2GizeuOfaLU2HOU/
                 edit#slide=id.g1eb6c1b5_3_6>.

The fragment identifier on the URI appears to take as to the last slide; 
maybe drop it?

    [HUFFMAN]    Huffman, D., "A Method for the Construction of Minimum
                 Redundancy Codes", Proceedings of the Institute of Radio
                 Engineers Volume 40, Number 9, pp. 1098-1101,
                 September 1952, <https://ieeexplore.ieee.org/xpl/
                 articleDetails.jsp?arnumber=4051119>.

That URI leads me to a broken page due to mixed content problems. The 
HTTP version works fine.

    [SPDY]       Belshe, M. and R. Peon, "SPDY Protocol",
                 draft-mbelshe-httpbis-spdy-00 (work in progress),
                 February 2012.

It's not work in progress, and we should have a URI for it. Yes, that's 
something the RFC Editor will need to figure out :-).


Appendix A.  Static Table Definition

    The static table (see Section 2.3.1) consists of a predefined and
    unchangeable list of header fields.

    The static table was created by listing the most common header fields
    that are valid for messages exchanged inside a HTTP/2 connection.

Is that really true? How come we have Proxy-Auth* in it then, and also 
an unregistered header field ("refresh")? I believe we need to document 
how we came up with this list.

           | 16    | accept-encoding             | gzip, deflate |

Appendix B.  Huffman Code


    As an example, the code for the symbol 47 (corresponding to the ASCII
    character "/") consists in the 6 bits "0", "1", "1", "0", "0", "0".

"consists of"?

    This corresponds to the value 0x18 (in hexadecimal) encoded on 6
    bits.

"encoded in"?

C.3.

It would be cool to have an example with repeating header fields (maybe 
two instances of Cache-Control)?
Received on Saturday, 10 January 2015 16:59:19 UTC