Re: i28 proposed replacement text from Jamie Lokier on 2008-05-13 (ietf-http-wg@w3.org from April to June 2008)

From: Jamie Lokier <jamie@shareable.org>
Date: Tue, 13 May 2008 18:16:21 +0100
To: Adrien de Croy <adrien@qbik.com>
Cc: Henrik Nordstrom <henrik@henriknordstrom.net>, ietf-http-wg@w3.org
Message-ID: <20080513171621.GA20262@shareable.org>
Adrien de Croy wrote:
> Transfer-encoding and Content-Encoding are fundamentally different.  It 
> helps if you look at it from the point of view of who does the encoding.

Agreed, but I think everyone who spoke in this thread knows the
difference.  The discussion hasn't denied that.  I can see why it
might look that way.

> Transfer-Encoding is performed on the fly by something in the stream 
> (e.g. proxy or output conversion process).

No, that's an implementation detail outside the scope of HTTP.  It
_can_ be implemented that way, but HTTP does not say anything about
that or require it.

> In such cases it's often impossible (i.e. non-deterministic length
> of output of encoding) to know the length of the whole transformed
> entity.

It's often impossible, but it's often possible.  Agents can internally
cache gzip transfer-encoded bodies and HTTP permits that (it says
nothing about it).

> Content-Encoding is different because the sender should know the length, 
> therefore can set Content-Length headers.

No, that's another implementation detail outside the scope of HTTP.
HTTP does not require the sender to know the length; it places no such
requirement, not even in principle.

In practice, many senders using Content-Encoding: gzip don't know the
length when they start sending, and use chunked encoding or close the
connection, which is allowed.

> It is deemed a separate entity - an attribute of which is an
> encoding, but as far as HTTP is concerned it may as well not be
> encoded.  The encoding is meant for the end consumer of the message.

That's all correct, but it has no bearing on how servers, clients and
proxies implement and generate those entities.

In practice, many servers _don't_ know the compressed Content-Length
when they begin transmitting unless they have an internal cache and
hit it.  Compressed Content-Encoding is usually implemented as
dynamically generated content.

> Sure the underlying technology is similar, but the message semantics are 
> completely different.

Yes, the semantics are different.

But the underlying work is so similar that they often use the same
implementation.  That includes dynamically generating compressed
content in _either_ case, dynamically decompressing at the receiver in
_either_ case, caching compressed representations in _either_ case, or
storing pre-compressed representations in _either_ case.

All those are allowed with HTTP, and they are good strategies for a
high quality implementation too.

> That's also why Content-length is banned with Transfer-Encoding, because 
> Content-Length is an entity attribute, just like Content-Type, and 
> Content-Encoding.

That makes no sense.  Specifically, "it's an entity attribute
therefore it is banned with Transfer-Encoding" makes no sense, because
other entity attributes are not banned with Transfer-Encoding.

It's a basic principle that transfer encoding is supposed to be
transparent to entity headers.  But that transparency is broken with
Content-Length.

The end consumer of a message is affected by this.

For example, downloading tools can't show percent progress if any
proxy hop decides to use a transfer-encoding - proving that
transfer-encoding is not transparent.

Imho, RFC 2616 section 4.4 rule 3 (Content-Length is not allowed with
transfer-coding) is illogical and inconsistent with Content-Length
being an entity header.  The rule is there only for compatibility, and
might be safe to relax in some cases where the receiver is known to
implement HTTP/1.1.

-- Jamie
Received on Tuesday, 13 May 2008 17:16:57 UTC