Coalescing ranges from A. Rothman on 2015-07-23 (ietf-http-wg@w3.org from July to September 2015)

From: A. Rothman <amichai2@amichais.net>
Date: Thu, 23 Jul 2015 17:56:50 +0000
To: ietf-http-wg@w3.org
Message-Id: <1197805198.1191437674171647.JavaMail.root@shefa>

Hi,

I'd like to raise an issue with the HTTP 1.1 RFCs regarding when the server can coalesce byte ranges requested by the client.

In RFC 2616, there's section that says:

"When an HTTP message includes the content of a single range (for
example, a response to a request for a single range, or to a request
for a set of ranges that overlap without any holes), ..."

which implies that there may be other legitimate cases in which the server may choose to coalesce the ranges, and it's all good.

However, RFC 7233 says:

"When multiple ranges are requested, a server MAY coalesce any of the
ranges that overlap, or that are separated by a gap that is smaller
than the overhead of sending multiple parts, regardless of the order
in which the corresponding byte-range-spec appeared in the received
Range header field.  Since the typical overhead between parts of a
multipart/byteranges payload is around 80 bytes, depending on the
selected representation's media type and the chosen boundary
parameter length, it can be less efficient to transfer many small
disjoint parts than it is to transfer the entire selected
representation."

Notably, it changed from a "for example" to a MAY with very detailed, very specific cases. Now it sounds to me like coalescing for other reasons is off the table.

I'd like to suggest that there may be other legitimate reasons why a server would want to coalesce the ranges, even if the exact efficiency calculations described in detail above do not hold. Regardless of the reasons for doing so, I think this should be more explicitly allowed by the RFC, or at least implicitly allowed with a 'for example' disclaimer like it used to be, for the simple reason that it would always still be more efficient to return a range (even if it's a single large range) than it would be to ignore the range header and return the whole content with a 200 status, which is explicitly allowed by the spec as an alternative. Disallowing it (even if implicitly, as the current wording might suggest) would prevent such simple but efficient optimizations from being valid, thus making the detailed byte-counting optimization detailed above somewhat... silly (for lack of a better word at the moment :-) ).

This of course would not affect existing servers, or new servers that want to implement any sort of complex calculations or heuristics on byte range responses (e.g. what is mentioned in the security considerations section), nor would it affect clients in any way since they must check the returned data ranges and handle any of these types of responses anyway.

If I've misinterpreted the RFC or missed some other section relevant to this issue, I'd be happy if someone can point it out :-)

Thanks,

Amichai

Received on Thursday, 23 July 2015 19:00:21 UTC