Re: Making Implicit C-E work. from Adrien de Croy on 2014-04-30 (ietf-http-wg@w3.org from April to June 2014)

From: Adrien de Croy <adrien@qbik.com>
Date: Wed, 30 Apr 2014 23:29:02 +0000
To: "Amos Jeffries" <squid3@treenet.co.nz>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-Id: <em06c85051-d4f0-4961-978e-b59cb672bf92@bodybag>
maybe it needs to be a different header.

since a C-E gzipped version of the same URI is strictly speaking a 
different entity (different E-tag etc) you can't arbitrarily swap 
between them.

however what's the goal here - simply to increase the use of compression 
correct?

Why mess with C-E or any entity headers?  What about another way of 
effectively doing T-E, but that doesn't need to be computed each time.  
Like Message-Encoding or something but which can be cached by the O-S or 
edge server or proxy or whatever, and it retains the original entity 
headers.  Maybe that is just T-E.

Adrien


------ Original Message ------
From: "Amos Jeffries" <squid3@treenet.co.nz>
To: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Sent: 1/05/2014 1:45:53 a.m.
Subject: Re: Making Implicit C-E work.

>On 30/04/2014 10:26 p.m., Roberto Peon wrote:
>>  On Wed, Apr 30, 2014 at 2:35 AM, Roland Zink <roland@zinks.de> wrote:
>>
>>>   On 30.04.2014 08:44, Roberto Peon wrote:
>>>
>>>>>>  As I described it, its use by the originator of an entity is not
>>>  mandated, instead behaviors are mandated of recipients when it IS 
>>>used.
>>>
>>>>   >>>
>>>>>>  
>>>>>>  Yeah, mandating it. Which I'm not happy about.
>>>>>
>>>>>  Mandates support, not use.
>>>>>
>>>>
>>>>  Kind of the same thing, from the client's POV. Server's choice.
>>>>
>>>
>>>     For C-E this means for example the server decides if the client 
>>>can do
>>>  a seek. Some interactive clients would prefer to do seeks over 
>>>getting the
>>>  content compressed. Whereas downloaders would prefer the content to 
>>>be
>>>  compressed. T-E would allow to have both seek and compression.
>>>
>>
>>  The server *always* decides what to send, whether c-e or t-e is used. 
>>The
>>  fact that the server *may* use gzip does not *require* it to use 
>>gzip, and
>>  with my proposal, the server knows if the client requested it 
>>explicitly or
>>  not, and it certainly can see if there is a range request and make 
>>the
>>  appropriate response.
>>
>>  T-E is theoretically wonderful if one ignores real deployments in 
>>today's
>>  world where the majority of HTTP/1.X servers don't actually do
>>  transfer-encoding: gzip, and thus HTTP2 gateways would have to do c-e 
>>to
>>  t-e translation (which might be rather error prone in its own way) or 
>>have
>>  to bear the expense of doing the compression themselves-- something 
>>which
>>  is untenable. This ignores the the security issue of knowing when t-e 
>>is
>>  safe, which I'll address again below.
>>
>>
>>
>>>
>>>    And today it is often neither the server nor the client's choice, 
>>>which
>>>  is what is causing the pain. The client expresses that it wants 
>>>gzip. The
>>>  intermediary doesn't do it because it makes numbers better, 
>>>increases
>>>  throughput, or because they're too lazy to implement it., all at the 
>>>cost
>>>  of the decreased user experience.
>>>
>>>
>>>>  <snip>
>>>>>
>>>>>  The combination of intermediaries stripping a-e plus the 
>>>>>competitive
>>>>  driver to deliver good experience/latency is causing interop 
>>>>failures today
>>>>  where servers will send gzip'd data whether or not the client 
>>>>declares
>>>>  support in a-e.
>>>>>
>>>>
>>>>  Wait, you're saying the whole motivator here is that servers don't 
>>>>comply
>>>>  with the protocol? So you're changing the protocol to accommodate 
>>>>them?
>>>>  That does not feel right to me, at all; it's not just blessing a 
>>>>potential
>>>>  misuse of C-E, it's wallpapering over a flat out abuse.
>>>>
>>>  Partially.
>>>  I'm saying that intermediaries are doing things which are incenting
>>>  implementors to break compatibility with the spec, and that 
>>>implementors
>>>  are doing so because it makes the users happy.
>>>  In the end, making the users happy is what matters, both 
>>>commercially and
>>>  privately. The users really don't care about purity, and will 
>>>migrate to
>>>  implementations that give them good/better user experience.
>>>
>>>   But even so, why do you have to fix it in HTTP/2? And why does it 
>>>hurt
>>>>  h2 to *not* fix it?
>>>>
>>>
>>>   Compression is an important part of making latency 
>>>decrease/performance
>>>  increase, and, frankly, there is little practical motivation to 
>>>deploy
>>>  HTTP/2 if it doesn't succeed in reducing latency/increase 
>>>performance.
>>>  Success isn't (or shouldn't be) defined as completing a protocol 
>>>spec, but
>>>  rather, getting an interoperable protocol deployed. If it doesn't 
>>>get
>>>  deployed, the effort is wasted. If it doesn't solve real problems, 
>>>the
>>>  effort is wasted.
>>>
>>>   In any case, I cannot reliably deploy a T-e based compression 
>>>solution.
>>>  T-e based compression costs too much CPU, especially as compared 
>>>with c-e
>>>  where one simply compresses any static entity once and decompresses 
>>>(which
>>>  is cheap) as necessary at the gateway.
>>>
>>>  If it is really T-E you can do the same compression of static 
>>>entities
>>>  when the whole file is delivered, it would be different for range 
>>>requests
>>>  or the frame based approach.
>>>
>>
>>  Now how we've thusfar spec'd it.
>>
>>
>>>    T-e based compression isn't as performant in terms of
>>>  compression/deflation ratios.
>>>
>>>  Don't think this is true, the same bytes can be sent as either T-E 
>>>or C-E.
>>>  For the frame based approach some numbers were given.
>>>
>>
>>  The same bytes can't be sent in both, unless the we're willing to 
>>suffer
>>  vastly increased DoS surface area and memory usage OR we do the 
>>frame-based
>>  approach, which will have marginally worse compression.
>>
>>>    Many deployed clients/servers wouldn't correctly support it.
>>>
>>>  There are no deployed HTTP2 clients or servers, or are there some?
>>>
>>
>>  There are, but I'm not talking about those. My problem is dealing 
>>with the
>>  rest of the world, which is mostly HTTP/1.X and is unlikely to 
>>rapidly
>>  change.
>>  In other words, I'm concerned mainly with HTTP/1.X clients and 
>>especially
>>  servers.
>>
>>>    T-e would require that any gateway acting as a 
>>>loadbalancer/reverse
>>>  proxy would either need to know which resources it could compress, 
>>>or
>>>  forces us to not use compression.
>>>
>>>  The gateway can forward the compressed content unmodified. The 
>>>gateway is
>>>  only forced to do something if either the server or the client 
>>>doesn't
>>>  support compression.
>>>
>>
>>  The gateway cannot know which resources it is safe to compress 
>>without
>>  something outside the protocol. Compression via t-e without knowing 
>>whether
>>  it is safe or not allows attackers to discern ostensibly secret
>>  information. This is NOT acceptable.
>>
>>>    Knowing what resources to compress either requires an oracle, or
>>>  requires content authors to change how they author content (*really* 
>>>not
>>>  likely to happen),
>>>
>>>     Not sure that authors want to know about compression. If it is
>>>  automatic then this would be fine. Currently there is server 
>>>configuration,
>>>  for example zlib.output_compression in php.ini, and the possibility 
>>>to do
>>>  this in the content, for example in PHP something like
>>>  ob_start('ob_gzhandler'). I guess there is a lot more authors are 
>>>not aware
>>>  off.
>>>
>>
>>  We definitely don't want to cause content authors to lose what little
>>  control (and understanding) they have today, especially over matters
>>  touching security like compression.
>>  In general, if a resource wasn't compressed on output from an 
>>endpoint, it
>>  shouldn't be when received by any other endpoint.
>
>This seems wrong. The general case is a resource not compressed when
>received by an endpoint it should not be compressed when leaving that
>*same* endpoint.
>Which I understand is what the proposals about C-E:gzip are saying
>gateways should do:
>   Accept implicit gzip within HTTP/2 so servers can emit it and
>decompress for identity-only representations as soon as they get to any
>HTTP/1 hop. The 1.1->2.0 transitions should obey the HTTP/1 senders use
>of T-E:gzip (if they attempt it) or retain identity on the new HTTP/2 
>hop.
>
>Essentially, resources start out compressed but anyone can decompress
>and it stays uncompressed for the remainder of the journey.
>
>
>I see no problem with a clause in the HTTP/2 spec regarding *T-E*
>mandating that T-E:gzip can only be removed, never added.
>
>This whole implicit C-E smells like an attempt to rename HTTP/1 
>T-E:gzip
>as HTTP/2 C-E:gzip and lump all the resulting deployment problems on 
>the
>gateway implementers shoulders.
>
>
>>  Given the necessity of interfacing with HTTP/1 servers, which rarely
>>  support T-E: gzip, this ends up being a problem for HTTP/2 and T-e: 
>>gzip.
>>
>
>No problem there. The HTTP/2 gateway already has mandatory
>(de)compression support in all of these proposals and the existing 
>specs
>text.
>
>Sending traffic received with T-E:gzip into HTTP/1 is a simple
>decompression.
>Receiving traffic from HTTP/1 does not require any compression unless
>the HTTP/1 endpoint *does* support T-E:gzip, in which case it is
>optional to do anything for HTTP/2.
>
>==> I would like to point out again that the security worries for T-E
>*do not exist* unless the HTTP/2 hop is *adding* compression on its 
>own.
>
>
>Amos
>
Received on Wednesday, 30 April 2014 23:29:39 UTC