Re: i28 proposed replacement text from Adrien de Croy on 2008-05-13 (ietf-http-wg@w3.org from April to June 2008)

From: Adrien de Croy <adrien@qbik.com>
Date: Wed, 14 May 2008 08:40:35 +1200
To: Adrien de Croy <adrien@qbik.com>, Henrik Nordstrom <henrik@henriknordstrom.net>, ietf-http-wg@w3.org
Message-ID: <4829FCC3.7030302@qbik.com>
Jamie Lokier wrote:
> Adrien de Croy wrote:
>   
>> Transfer-encoding and Content-Encoding are fundamentally different.  It 
>> helps if you look at it from the point of view of who does the encoding.
>>     
>
> Agreed, but I think everyone who spoke in this thread knows the
> difference.  The discussion hasn't denied that.  I can see why it
> might look that way.
>
>   
Understood - some posts didn't appear clear on it, and there are many 
more reading this list than posting to it.

>> Transfer-Encoding is performed on the fly by something in the stream 
>> (e.g. proxy or output conversion process).
>>     
>
> No, that's an implementation detail outside the scope of HTTP.  It
> _can_ be implemented that way, but HTTP does not say anything about
> that or require it.
>
>   
sure.  Insert "commonly" after "Transfer-Encoding is..."

>> In such cases it's often impossible (i.e. non-deterministic length
>> of output of encoding) to know the length of the whole transformed
>> entity.
>>     
>
> It's often impossible, but it's often possible.  Agents can internally
> cache gzip transfer-encoded bodies and HTTP permits that (it says
> nothing about it).
>   

OK.  That blurs the boundaries a fair bit.
>   
>> Content-Encoding is different because the sender should know the length, 
>> therefore can set Content-Length headers.
>>     
>
> No, that's another implementation detail outside the scope of HTTP.
> HTTP does not require the sender to know the length; it places no such
> requirement, not even in principle.
>
> In practice, many senders using Content-Encoding: gzip don't know the
> length when they start sending, and use chunked encoding or close the
> connection, which is allowed.
>   
Another implementation detail I guess.  If a server is creating encoded 
versions of resources, and caching them (to save subsequent CPU cycles 
on re-encoding), they they may create another entity due to race 
conditions and synch issues (i.e. if the source is often changed, it 
becomes difficult to keep the encoded version in synch with the original).

Encoding it all on the fly as an output transformation process, sending 
chunked etc but using Content-Encoding is basically using 
Content-Encoding to do Transfer-Encoding's job.

There are major side-effects, not the least interop with HTTP/1.0 agents 
in the request chain, so I can see why people would take this route 
(although they then can't use chunking). 

I guess another fundamental difference, is it's legal for a proxy to 
convert a stream to Transfer-Encoded, but not to do so for 
Content-Encoded, since it breaks the entity tag for caching.

>   
>> It is deemed a separate entity - an attribute of which is an
>> encoding, but as far as HTTP is concerned it may as well not be
>> encoded.  The encoding is meant for the end consumer of the message.
>>     
>
> That's all correct, but it has no bearing on how servers, clients and
> proxies implement and generate those entities.
>
> In practice, many servers _don't_ know the compressed Content-Length
> when they begin transmitting unless they have an internal cache and
> hit it.  Compressed Content-Encoding is usually implemented as
> dynamically generated content.
>
>   
>> Sure the underlying technology is similar, but the message semantics are 
>> completely different.
>>     
>
> Yes, the semantics are different.
>
> But the underlying work is so similar that they often use the same
> implementation.  That includes dynamically generating compressed
> content in _either_ case, dynamically decompressing at the receiver in
> _either_ case, caching compressed representations in _either_ case, or
> storing pre-compressed representations in _either_ case.
>
> All those are allowed with HTTP, and they are good strategies for a
> high quality implementation too.
>
>   
>> That's also why Content-length is banned with Transfer-Encoding, because 
>> Content-Length is an entity attribute, just like Content-Type, and 
>> Content-Encoding.
>>     
>
> That makes no sense.  Specifically, "it's an entity attribute
> therefore it is banned with Transfer-Encoding" makes no sense, because
> other entity attributes are not banned with Transfer-Encoding.
>
> It's a basic principle that transfer encoding is supposed to be
> transparent to entity headers.  But that transparency is broken with
> Content-Length.
>
> The end consumer of a message is affected by this.
>
> For example, downloading tools can't show percent progress if any
> proxy hop decides to use a transfer-encoding - proving that
> transfer-encoding is not transparent.
>   
OK.  I was trying to come up with the rationale why Content-Length was 
banned with T-E.  I guess the answer is due to backward compatibility 
issues.

> Imho, RFC 2616 section 4.4 rule 3 (Content-Length is not allowed with
> transfer-coding) is illogical and inconsistent with Content-Length
> being an entity header.  The rule is there only for compatibility, and
> might be safe to relax in some cases where the receiver is known to
> implement HTTP/1.1.
>   
Actually I agree.  I've posted before about using Content-Length + 
chunking for instance to communicate at the start what the entity length 
(after removing any transfer-encoding) should be.  This is in many cases 
VERY important information, and precluding the possibility of 
transmitting that information when it may actually be available simply 
because of a T-E in use seems unnecessary and wasteful.

But that would break many HTTP/1.0 intermediaries, so I think we're 
stuck with using another header, or an extension to a T-E header.

Actually I'd prefer it entity headers could be associated with the 
entity.  E.g. with multipart responses, have a block of entity headers 
per entity, and keep them outside the general message header block.  
(e.g. another blank line, or mime multipart boundaries) This would mean 
intermediaries wouldn't even need to parse entity headers, just message 
headers.  But that's no longer HTTP.

Adrien

> -- Jamie
>   

-- 
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com
Received on Tuesday, 13 May 2008 20:55:57 UTC