Re: IETF LC comments on draft-ietf-httpbis-header-compression-10 from Hervé Ruellan on 2015-01-23 (ietf-http-wg@w3.org from January to March 2015)

From: Hervé Ruellan <herve.ruellan@crf.canon.fr>
Date: Fri, 23 Jan 2015 17:14:13 +0100
To: Julian Reschke <julian.reschke@greenbytes.de>, <ietf@ietf.org>
CC: <ietf-http-wg@w3.org>
Message-ID: <54C27355.5090203@crf.canon.fr>
Hi Julian,

See my answers below.

I generated a pull request for resolving your comments at: 
https://github.com/http2/http2-spec/pull/705.

On 01/10/2015 05:58 PM, Julian Reschke wrote:
> Hi there,
>
> these comments are mostly editorial; the main exception being the
> contents of the predefined static table.
>
> Best regards, Julian
>
>
> 1.  Introduction
>
>     In HTTP/1.1 (see [RFC7230]), header fields are not compressed.  As
>     Web pages have grown to include dozens to hundreds of requests, the
>
> s/include/cause/ (the page doesn't really include requests)
>
> 1.1.  Overview
>
>     The format defined in this specification treats a list of header
>     fields as an ordered collection of name-value pairs that can include
>     duplicates.  Names and values are considered to be opaque sequences
>     of octets, and the order of header fields is preserved after being
>     compressed and decompressed.
>
> When we say duplicate here, it's about duplicate *names*, right?

It's duplicate pairs.

> 2.1.  Header List Ordering
>
>     HPACK preserves the ordering of header fields inside the header list.
>     An encoder MUST order header field representations in the header
>     block according to their ordering in the original header list.  A
>     decoder MUST order header fields in the decoded header list according
>     to their ordering in the header block.
>
> Just before we said that we do not define the encoder. Maybe reprase the
> first MUST as a statement of fact.

I think that's one of the few requirements we have on the encoder, and 
even if we say we don't define the encoder, I prefer to keep the MUST to 
have it visible.

> 2.3.2.  Dynamic Table
>
>     The dynamic table can contain duplicate entries.  Therefore,
>     duplicate entries MUST NOT be treated as an error by a decoder.
>
> What's the definition of "duplicate" here? Name and value match?

Yes.

> 3.2.  Header Field Representation Processing
>
>     The processing of a header block to obtain a header list is defined
>     in this section.  To ensure that the decoding will successfully
>     produce a header list, a decoder MUST obey the following rules.
>
> "This section defines..."
>
> 4.1.  Calculating Table Size
>
>     The size of the dynamic table is the sum of the size of its entries.
>
>     The size of an entry is the sum of its name's length in octets (as
>     defined in Section 5.2), its value's length in octets (see
>     Section 5.2), plus 32.
>
> Remove one of the references to Section 5.2.
>
>     The size of an entry is calculated using the length of the name and
>     value without any Huffman encoding applied.
>
> This is the first time that Huffman is mentioned; it would probably be
> good to mention it earlier on.

I added it in the overview, and in 2.4 Header Field Representation.

>
>     NOTE: The additional 32 octets account for the overhead associated
>     with an entry.  For example, an entry structure using two 64-bit
>     pointers to reference the name and the value of the entry, and two
>     64-bit integers for counting the number of references to the name and
>     value would have 32 octets of overhead.
>
> maybe say "estimated overhead"?
>
> 5.1.  Integer Representation
>
>     Integers are used to represent name indexes, pair indexes or string
>     lengths.  To allow for optimized processing, an integer
>     representation always finishes at the end of an octet.
>
> First use of "pair index"; at this point it's not clear what that might be.

Changed it to header field index.

>
>     Pseudo-code to represent an integer I is as follows:
>
>     if I < 2^N - 1, encode I on N bits
>     else
>         encode (2^N - 1) on N bits
>         I = I - (2^N - 1)
>         while I >= 128
>              encode (I % 128 + 128) on 8 bits
>              I = I / 128
>         encode I on 8 bits
>
>     Pseudo-code to decode an integer I is as follows:
>
>     decode I from the next N bits
>     if I < 2^N - 1, return I
>     else
>         M = 0
>         repeat
>             B = next octet
>             I = I + (B & 127) * 2^M
>             M = M + 7
>         while B & 128 == 128
>         return I
>
> We have bad experience with making indentation in artwork significant;
> maybe introducing brackets would be a good idea.

I love python ;-)

>
>     This integer representation allows for values of indefinite size.  It
>     is also possible for an encoder to send a large number of zero
>     values, which can waste octets and could be used to overflow integer
>     values.  Excessively large integer encodings - in value or octet
>     length - MUST be treated as a decoding error.  Different limits can
>     be set for each of the different uses of integers, based on
>     implementation constraints.
>
> Having a MUST here when we don't say what "excessive" is seems like a
> bad idea.

Replaced by "Integer encodings that exceed an implementation limits"
>
> 5.2.  String Literal Representation
>
>     Header field names and header field values can be represented as
>     literal string.  A literal string is encoded as a sequence of octets,
>     either by directly encoding the literal string's octets, or by using
>     a Huffman code (see [HUFFMAN]).
>
>       0   1   2   3   4   5   6   7
>     +---+---+---+---+---+---+---+---+
>     | H |    String Length (7+)     |
>     +---+---------------------------+
>     |  String Data (Length octets)  |
>     +-------------------------------+
>
>                    Figure 4: String Literal Representation
>
>     A literal string representation contains the following fields:
>
>     H: A one bit flag, H, indicating whether or not the octets of the
>        string are Huffman encoded.
>
>     String Length:  The number of octets used to encode the string
>        literal, encoded as an integer with 7-bit prefix (see
>        Section 5.1).
>
>     String Data:  The encoded data of the string literal.  If H is '0',
>        then the encoded data is the raw octets of the string literal.  If
>        H is '1', then the encoded data is the Huffman encoding of the
>        string literal.
>
> The figure could be interpreted as if the string length is always
> expressed in 7 bits, which I believe is not true.
>
>
> 6.2.1.  Literal Header Field with Incremental Indexing
>
>     A literal header field with incremental indexing representation
>     results in appending a header field to the decoded header list and
>     inserting it as a new entry into the dynamic table.
>
>       0   1   2   3   4   5   6   7
>     +---+---+---+---+---+---+---+---+
>     | 0 | 1 |      Index (6+)       |
>     +---+---+-----------------------+
>     | H |     Value Length (7+)     |
>     +---+---------------------------+
>     | Value String (Length octets)  |
>     +-------------------------------+
>
>      Figure 6: Literal Header Field with Incremental Indexing - Indexed
>                                     Name
>
> See above. One can read this as "H" being bit 0 in the second octet; but
> that's not always the case. I think it would be good to revise the
> figure format.

I'll try to find a better format.

> 6.3.  Dynamic Table Size Update
>
>     The new maximum size MUST be lower than or equal to the last value of
>     the SETTINGS_HEADER_TABLE_SIZE parameter (see Section 6.5.2 of
>     [HTTP2]) received from the decoder and acknowledged by the encoder
>     (see Section 6.5.3 of [HTTP2]).
>
> Some parts of the spec read as if HPACK doesn't have direct dependency
> on HTTP/2 or HTTP in general. However, here a normative dependency is
> introduced.

I removed this direct dependency.

> 7.1.  Probing Dynamic Table State
>
>     This is possible even over TLS, because while TLS provides
>     confidentiality protection for content, it only provides a limited
>     amount of protection for the length of that content.
>
> Expand "TLS" on first use, and add a reference.
>
> 9.  References
>
> 9.1.  Normative References
>
>     [HTTP2]      Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
>                  Transfer Protocol version 2",
>                  draft-ietf-httpbis-http2-16 (work in progress),
>                  October 2014.
>
> November (already fixed in Git).
>
> 9.2.  Informative References
>
>     [CRIME]      Rizzo, J. and T. Duong, "The CRIME Attack",
>                  September 2012, <https://docs.google.com/a/twist.com/
>                  presentation/d/
>                  11eBmGiHbYcHR9gL5nDyZChu_-lCa2GizeuOfaLU2HOU/
>                  edit#slide=id.g1eb6c1b5_3_6>.
>
> The fragment identifier on the URI appears to take as to the last slide;
> maybe drop it?
>
>     [HUFFMAN]    Huffman, D., "A Method for the Construction of Minimum
>                  Redundancy Codes", Proceedings of the Institute of Radio
>                  Engineers Volume 40, Number 9, pp. 1098-1101,
>                  September 1952, <https://ieeexplore.ieee.org/xpl/
>                  articleDetails.jsp?arnumber=4051119>.
>
> That URI leads me to a broken page due to mixed content problems. The
> HTTP version works fine.
>
>     [SPDY]       Belshe, M. and R. Peon, "SPDY Protocol",
>                  draft-mbelshe-httpbis-spdy-00 (work in progress),
>                  February 2012.
>
> It's not work in progress, and we should have a URI for it. Yes, that's
> something the RFC Editor will need to figure out :-).
>
>
> Appendix A.  Static Table Definition
>
>     The static table (see Section 2.3.1) consists of a predefined and
>     unchangeable list of header fields.
>
>     The static table was created by listing the most common header fields
>     that are valid for messages exchanged inside a HTTP/2 connection.
>
> Is that really true? How come we have Proxy-Auth* in it then, and also
> an unregistered header field ("refresh")? I believe we need to document
> how we came up with this list.
>
>            | 16    | accept-encoding             | gzip, deflate |
>
> Appendix B.  Huffman Code
>
>
>     As an example, the code for the symbol 47 (corresponding to the ASCII
>     character "/") consists in the 6 bits "0", "1", "1", "0", "0", "0".
>
> "consists of"?
>
>     This corresponds to the value 0x18 (in hexadecimal) encoded on 6
>     bits.
>
> "encoded in"?
>
> C.3.
>
> It would be cool to have an example with repeating header fields (maybe
> two instances of Cache-Control)?
>
>
>
>
Received on Friday, 23 January 2015 16:14:48 UTC