Re: Range Requests vs Content Codings

On 17 Jun 2014, at 8:26 pm, Julian Reschke <julian.reschke@gmx.de> wrote:

> Trying to summarize the problem, and trying a solution:
> 
> Consider a text/plain resource http://example.org/test of 1000000 octets length (representation A), supporting content coding "gzip", yielding 100000 octets (representation B).

Nit: I think you mean a resource with two text/plain representations; one without content-coding that's 1,000,000 octets (A), and one with 'gzip' content-coding that's 100,000 octets (B).

> 
> Upon GET, clients can select which one to use using "Accept-Encoding". For instance

Nit: the server always performs selection.

> 
>  GET /test HTTP/1.1
>  Host: example.org
>  Accept-Encoding: gzip
> 
> is likely to return the representation B, while
> 
>  GET /test HTTP/1.1
>  Host: example.org
>  Accept-Encoding: identity
> 
> will return representation A.

Yes.


> A specified range will always apply to the representation. Thus, a client can't easily ask for a specific range of representation A *and* have the server apply Content-Coding gzip.

Well, that's because A isn't gzipped, but OK...


> (Compression could also be achieved by using Transfer Codings, but these are not implemented in practice)
> 
> One way to combine Content Codings and range requests would be to create a new range unit, "bbcc" (bytes-before-content-coding). In which case the the requested range would be applied to the non-content-coded representation, and the content-coding would be applied to the byte range.
> 
> Such as:
> 
>  GET /test HTTP/1.1
>  Host: example.org
>  Accept-Encoding: gzip
>  Range: bbcc=900000-
> 
> This would retrieve the octets starting at position 900000, and apply content-coding gzip to the resulting octet sequence.
> 
> Note that to combine range responses using these byte range units, a recipient needs to understand the range unit (simple concatenation isn't going to work).
> 
> This also requires that both user agent and origin server understand the new range unit, but that appears to be easier to deploy than T-E (which requires all intermediaries to play along).
> 
> Thoughts?

To me, whenever we have someone propose a new range-unit, one of the deciding factors is whether it'll be useful and implemented by intermediaries -- because one of the main benefits of adding new HTTP mechanisms like this is to allow generic software to take advantage of it.

For this scheme to work in an intermediary, it looks like it'll have to cache the uncompressed response to use this range unit, and decompress responses as they come back. That doesn't seem terribly attractive, from their standpoint.

Also, this won't work with intermediaries until they actually support it.

The alternative is to do what people do now -- to use an application-specific mechanism (e.g., a URL query parameter) to fetch parts of the response.  That seems like it would satisfy the use case that was talked about earlier (from Bjoern):

> Imagine you have a remote resource that is regularily appended to, like a log file or a mailing list archive mbox file.
> You synchronise with it by making regular Range requests to it to retrieve the content that has since been appended, if any.


The natural questions seem to be:

- What benefits does standardising a range-unit bring this use case (over just using application-specific semantics)?
- Are there other use cases that are materially different?

Cheers,

P.S. Happy to give agenda time in Toronto to this if you think it's ready to discuss...


--
Mark Nottingham   https://www.mnot.net/

Received on Wednesday, 18 June 2014 00:33:30 UTC