Re: current HTTP/2 spec prevents gzip of response to "Range" request from Matthew Kerwin on 2014-03-28 (ietf-http-wg@w3.org from January to March 2014)

From: Matthew Kerwin <matthew@kerwin.net.au>
Date: Fri, 28 Mar 2014 22:36:18 +1000
To: K.Morgan@iaea.org
Cc: "Roy T. Fielding" <fielding@gbiv.com>, Bjoern Hoehrmann <derhoermi@gmx.net>, roland@zinks.de, C.Brunhuber@iaea.org, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CACweHNCvSz49Xs8mp4HxeqJNWOkQefwZ5je5pRzPQp57bt4m-g@mail.gmail.com>
On 28 March 2014 06:55, <K.Morgan@iaea.org> wrote:

> Hi Matthew-
>
> On 27 March 2014 05:45, Matthew Kerwin wrote:
> > ... It's entirely up to my server configuration to decide whether to
> > serve up the contents of index.html, or index.html_gz, or index.txt,
> > or any other random *representation* of that resource.  Importantly,
> > each of these representations is a distinct entity, with its own
> > entity-specific metadata (Content-Type, Content-Encoding,
> > Content-Length, Last-Modified, ETag, etc.)...
>
> What if a client requests directly /index.html_gz, do you still send that
> as "Content-Encoding: gzip\r\nContent-Type: text/html" or just
> "Content-Type: application/x-gzip"?
>

For starters we have to be clear that /foo/ and /foo/index.html_gz are
different resources, with different sets of available representations.
That said, in this case I'd respond with Content-Encoding:gzip. With a bit
more configuration I could set it up an opposite internal redirection, or
to send a 406, but I repeat that I've not ever revealed the URI for the
resource that only has a content-encoded representation. I would never link
directly to index.html_gz, nor would I include it in a Content-Location
header. If someone guesses the URL, and can't deal with the gzip encoding,
then it's kind of their own fault. Thus I violate a SHOULD-level
requirement, and don't feel bad at all.  ;)

What about a resource e.g. /search.php that has dynamic results and the
> response body is gzip compressed to save bandwidth? Are you saying each
> unique result is a distinct entity and so that too should be
> "Content-Encoding: gzip\r\nContent-Type: text/html" or is this a case where
> it really should be called "Transport-Encoding: gzip\r\nContent-Type:
> text/html"?


Well, each response *is* a distinct entity. But more significantly, each
response could be one of two different entities -- the HTML or the gzip.
Call them H1 and Z1. On a subsequent request you choose one of two
*different* entities to send in response, H2 and Z2. The fact that *1 are
different from *2 has no bearing at all on whether you choose to send H* or
Z* -- that decision is based on what you prefer, and what the client will
accept.



> > It also allows entity-specific operations like range requests.
>
> What if user directly requests a range of /huge.resource (e.g a huge
> non-compressed entity), but you want to save bandwidth on the transfer with
> compression? Based on the requirements of RFC2616, I assert you can't
> compress without using "Transport-Encoding: gzip".
>

That depends on how well we're dancing. If you requested my /foo/ with
Accept-Encoding and Range, I could choose to send part of my gzipped file
in a 206 with appropriate Content-Range, Content-Encoding, ETag, and Vary
headers.

That way you can subsequently send a request with either If-Range or
If-Match, passing back my etag, and appropriate Accept-Encoding etc.; and I
can give you another chunk (or the rest, or none) of my gzipped file.

I've had a quick read through -p5, and I'm pretty sure the general gist
still holds. I'd have to add some more headers in the first response.

That's not to say that this is a *better* solution than sending it
transfer-encoded, but AFAIK it's perfectly valid and legal. It is less
likely to be useful (how much can you do with part of a gzip file?), but
has better guarantees for end-to-end compression. The only thing to
remember is that, if you don't have the pre-compressed representation
handy, you'll have to be certain that each time you generate it you end up
with the exact same bytes and the same etag, and that you do the
compression *before* doing the range slicing.

> I propose not confusing hop-by-hop transport with end-to-end content.
> > We already have an end-to-end compression mechanism, which works for
> > the most part. I just don't like it because people have gotten
> > confused by it. That doesn't mean that it *can't* be used correctly,
> > or that it doesn't have any value.
>
> Can you give some examples of what you consider the confusion?
>

Well, Martin Thomson got confused not three days ago in this thread. And
he's an editor of the HTTP spec.

> Even were gzip transport a MUST-level requirement for HTTP/2, there
> > would still be 2->1.1 gateways that are forced to strip the transport
> > compression because the machinery on the 1.1 side doesn't send TE
> > headers. Therefore I argue that we should definitely NOT get rid of
> > Content-Encoding: gzip. What I propose is that we recommend support
> > for TE:gzip, and hope that the 1.X parts of the web fade away enough
> > that nobody cares about them not having compressed data everywhere.
> > Hence "best practice guideline, not an interoperability requirement."
>
> In our original proposal we had guidelines for what intermediaries have to
> do for the 2->1.1 gateways if the 1.1 side doesn't send a "TE: gzip"
> header. I'm not sure why that wouldn't work with a MUST-level requirement
> for HTTP/2.
>

The guidelines are missing a bunch of conditions, but even then, you're
suggesting people do something in direct violation of the protocol:

   b) the intermediary SHOULD not decompress payloads that are gzip
transfer encoded and have a :status header value not "206", and if the
intermediary elects to keep the payload compressed, MUST remove the value
"gzip" from the Transfer-Encoding header and insert the header
"Content-Encoding: gzip" in order to maintain backwards compatibility with
HTTP/1.1 clients,

That whole requirement is wrong. It should be: b) the intermediary MUST
decompress the payload. That's it. And that's true in a 1->2 gateway, or a
1->1 proxy, or a 2->2 proxy. That's what hop-by-hop transport is all about.
You can't just go inventing new content-encoded entities willy-nilly,
especially this way.



> I'm also not sure exactly what part of our proposal you are referring to
> that should be a best practice guideline (also not exactly sure how you put
> a "best practice" guideline in an RFC - is there a guideline on how to
> write such a guideline).


Basically, all of it. Particularly this:

Clients MUST support gzip compression for HTTP response bodies. Regardless
of the value of the TE request header field, a server MAY send responses
with gzip transfer encoding.

It's dangerously ambiguous, but I figure what you mean is that client MUST
support responses that include Content-Encoding:gzip and/or
Transfer-Encoding:gzip. I say leave the Content-Encoding part out of it,
since that's covered elsewhere, and say something along the lines of "TE
headers exist, and some of them include 'gzip'." Then somewhere else, like
a completely different document, say "I strongly recommend that everyone
support TE:gzip, because it's awesome." Here's a list of Best Current
Practice docs: http://www.apps.ietf.org/rfc/bcplist.html



-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/
Received on Friday, 28 March 2014 12:36:47 UTC