Re: if-range requests and compressed response from Ashok Kumar on 2011-07-29 (ietf-http-wg@w3.org from July to September 2011)

From: Ashok Kumar <ashokkumar.j@gmail.com>
Date: Fri, 29 Jul 2011 11:15:22 +0530
To: Brian Pane <brianp@brianp.net>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAOeYYReiBFr-QSnuurkBDZk5GSZdYrnCaitk638BQ5ds1T_z6A@mail.gmail.com>
Thanks All,

J Ross Nicoll wrote
> I think the answer to your last question is to look the Content-Encoding
header
> (I'm unclear on how ETag would indicate compression?), which indicates
whether
> the server has compressed the content to be sent.
Ross, here I was talking about how a server(or intermediate cache doing
compression)
could identify if the if-range request was for the compressed entity or
uncompressed
one. i.e. whether the original response was served compressed or
uncompressed. If an
E-Tag is present, it will be different for compressed and uncompressed
entities and
the server can deduce if the original response was compressed or not. But if
all that
the server gets is a last modified date in the if-Range, then how to
determine which
content-encoding was served last time. (The assumption here is that the
server can
possibly skip compressing content even though A-E: gzip,deflate was received
from client,)


Amos Jeffries wrote
> see above.  BOTH ETag and Last-Modified must be identical for any merger.
If either is missing or different no merge.
> Someone will probably flame me about this, but ... IMO having a date in
there means no Etag. So the cache can't use
> any version for which it has an ETag.

If this is true, i.e. the client will merge the 206 if (and only if) the
E-Tag is also present
and matches, then my above worry about merging is unjustified. Thanks :)

Brian Pane wrote
> As you noted, this implies that a server (or intermediary) that does
> dynamic compression is implicitly required to produce the same output
> every time it compresses the same response.

So this also implies that for dynamically generated gzip compressed
response, when server
receives an If-Range, then server can only respond with 200 with
uncompressed response?

This begs another question. Though not specific to HTTP WG, but just to make
sure my
understanding is correct. IIRC, Apache mod_gzip fixed the E-Tag generation
problem
by appending "-gzip" or similar to end of E-tag of the original content. If
two runs of
gzip compression don't produce the same output, isn't the E-Tag generated
wrong? i.e.
when dynamically compressing the content, a completely new E-Tag must be
generated
rather than just appending a static string to E-tag of original uncompressed
content.

Thanks for clarifications again, Also please do let me know if HTTP WG is
not the right place for
discussion about these implementation issues and please point to right
discussion list.
-Ashok

On Thu, Jul 28, 2011 at 9:53 PM, Brian Pane <brianp@brianp.net> wrote:

> On Thu, Jul 28, 2011 at 3:33 AM, Ashok Kumar <ashokkumar.j@gmail.com>
> wrote:
> > Hi All,
> > Can someone clarify on which representation the "Range" points to for an
> > "If-Range" request, when the original response was compressed.
> >  * Should the server return the specified range from the compressed
> > response (i.e. compress the original content again, if needed, and send
> the
> > relevant range, assuming the compression algorithm generates the same
> byte
> > stream again as the previous response!!) or
> >  * does the range point to bytes from the (decompressed/)uncompressed
> > response? if so, can the 206 partial response be non-compressed, though
> the
> > original response was compressed. How will client "fill" its cache with
> this
> > range? (i.e. will the client cache have a complete response after
> receiving
> > this for the given E-Tag?)
>
> I looked into this same question a while ago and concluded that the
> right interpretation is your first one: "return the specified range
> from the compressed response."
>
> The key sections of RFC 2616 are 14.35.1:
>
> "Byte range specifications in HTTP apply to the sequence of bytes in
> the entity-body (not necessarily the same as the message-body)."
>
> and 14.11:
>
> "The Content-Encoding entity-header field is used as a modifier to the
> media-type. When present, its value indicates what additional content
> codings have been applied to the entity-body."
>
> Putting these two sections together, the server must interpret the
> Range request relative to the entity-body, and the entity-body is the
> document *after* compression and any other Content-Encodings have been
> applied.
>
> If I remember correctly, this interpretation is consistent with how
> Apache and nginx do their response filtering: compression is applied
> first, and then the requested byte-ranges are extracted.
>
> As you noted, this implies that a server (or intermediary) that does
> dynamic compression is implicitly required to produce the same output
> every time it compresses the same response.  This means, for example,
> that a server can't reuse its gzip dictionary state across multiple
> responses on the same connection in hopes of achieving better
> compression.  But the server shouldn't be doing that anyway, based on
> the "considering messages in isolation" clarification in HTTPbis:
> http://trac.tools.ietf.org/wg/httpbis/trac/ticket/288
>
> -Brian
>
>


-- 
.- ... .... --- -.-
Received on Friday, 29 July 2011 05:45:51 UTC