Re: current HTTP/2 spec prevents gzip of response to "Range" request from Matthew Kerwin on 2014-03-27 (ietf-http-wg@w3.org from January to March 2014)

From: Matthew Kerwin <matthew@kerwin.net.au>
Date: Thu, 27 Mar 2014 14:45:18 +1000
To: K.Morgan@iaea.org
Cc: "Roy T. Fielding" <fielding@gbiv.com>, Bjoern Hoehrmann <derhoermi@gmx.net>, roland@zinks.de, C.Brunhuber@iaea.org, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CACweHNDL60Y_mjpcMhuujkP-gzVr3G+c3Ofj4pjjCt3Ui3ymuQ@mail.gmail.com>
K.Morgan@iaea.org wrote:
>
>> For the record: I routinely configure Apache to serve
>> offline-compressed versions of files [1], which I believe is the
>> Right Way(tm) to do CE
>
> Are you sure this is the right way? I guess it depends on what kind
> of offline-compressed files you are talking about. For sure yes for
> static .tar.gz archives, etc. But if you are talking about your
> example of index.html that automatically redirects to
> index.html_gz, I would argue that this is actually a transport
> encoding because the client is supposed to remove the compression
> before presenting it to the user (see Roy's distinction between
> TE/CE in his e-mail dated Monday,24 March 2014 21:19) - you are
> just smartly saving server resources by not re-compressing the file
> every time a client requests it (IIS for example, also automatically
> does this for you).

In saying that you're making a judgement call about my motivation
for compressing the file, or alternatively for my decision to keep the
uncompressed one around.

It's important to remember that, in this case especially, the
user might be making a request for the resource at
http://www.example.com/foo/

It's entirely up to my server configuration to decide whether to
serve up the contents of index.html, or index.html_gz, or index.txt,
or any other random *representation* of that resource.  Importantly,
each of these representations is a distinct entity, with its own
entity-specific metadata (Content-Type, Content-Encoding,
Content-Length, Last-Modified, ETag, etc.) which Apache, through the
glory of stat() and MIME libraries and other magic, already handles
for me. It also allows entity-specific operations like range requests.

I think that puts me bang in line with what Roy said:
"Transfer Encoding is something that can be added or removed by the
protocol.  Content Encoding is metadata about the representation."

.

>
>> Because of the MUST NOT, I now cannot take advantage of my
>> proactive compression in HTTP/2, and instead either 1) require
>> the server to compress the response on-the-fly each time, or
>> 2) lose the compression.
>
> Not sure which MUST NOT you're referring to? But the idea for this
> exact scenario was to send the header "Transfer-Encoding: gzip" and
> then put the contents of index.html_gz directly in the response
> body.

This one: "Servers MUST not include the values "gzip" or "deflate"
in a Content-Encoding header ..."

If I configured Apache to serve the contents of index.html_gz
alongside a transfer-encoding header, I'd have to be very careful
to ensure that index.html and index.html_gz do contain exactly the
same contents (after inflation), whereas with CE there's no such
restriction -- it would be mean of me to send an older, or even
completely different, gzip'd version, but not illegal.

I'd also have a hell of a time convincing Apache which headers come
from which file (ETag? Content-Length?), let alone how to resolve
range requests. Actually, to align with HTTPbis I'd have to remove
the Content-Length, which brings its own unique pains.

.

>
>> Do you mean HTTP-p1, Section 4.3 ?
>
> Section 14.39 specifically talks about the TE header and specifies
> that chunked is always acceptable. We could reference both sections
> I guess.

-p2 <http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-26>
doesn't even have a section 14.  You seem to be referencing RFC 2616,
which AFAIK isn't split into parts.

--

>
>> Also, because TE is hop-by-hop I risk some intermediary terminating
>> the compression, possibly negatively impacting my site's
>> responsiveness in a way that is outside my control; but CE is
>> end-to-end, so intermediaries shouldn't be touching the compressed
>> data (I say "shouldn't" but I know some still will).
>
> This is a good point. I'm not sure what the right answer is.  What
> do you propose? (The ideal solution, I think, would be an end-to-end
> content transfer encoding.)

I propose not confusing hop-by-hop transport with end-to-end content.
We already have an end-to-end compression mechanism, which works for
the most part. I just don't like it because people have gotten
confused by it. That doesn't mean that it *can't* be used correctly,
or that it doesn't have any value.

Even were gzip transport a MUST-level requirement for HTTP/2, there
would still be 2->1.1 gateways that are forced to strip the transport
compression because the machinery on the 1.1 side doesn't send TE
headers. Therefore I argue that we should definitely NOT get rid of
Content-Encoding: gzip. What I propose is that we recommend support
for TE:gzip, and hope that the 1.X parts of the web fade away enough
that nobody cares about them not having compressed data everywhere.

Hence "best practice guideline, not an interoperability requirement."

Of course there still remains the issue of double-compression.  I
agree with what Poul said up-thread: that there's no easy solution;
and that's something I want to see solved. However I don't know that
it's necessarily a reason to completely remove TE from the spec, which
is effectively where we're at today.

-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/
Received on Thursday, 27 March 2014 04:45:46 UTC