Re: estimated Content-Length with chunked encoding from Adrien de Croy on 2008-10-21 (ietf-http-wg@w3.org from October to December 2008)

From: Adrien de Croy <adrien@qbik.com>
Date: Tue, 21 Oct 2008 14:52:52 +1300
To: David Morris <dwm@xpasc.com>
CC: ietf-http-wg@w3.org
Message-ID: <48FD35F4.9010706@qbik.com>

another key use of such a field is for policy control.

For instance a proxy that doesn't want people downloading files over a 
certain size, or uploading items over a certain size, but for various 
reasons the transfer mode needs to be chunked (e.g. NTLM auth + POST).

In such cases the worth of such a value and how important its accuracy 
is, is related to what it will be used for.  A proxy may not care really 
how accurate the estimate is, if it will use it for blocking, although 
if the reality shows the estimate was very inaccurate, some other 
processing / consequence may be required.  E.g if we are to use such a 
thing for policy (which I know users wish for since I have had many such 
user requests) then we can't just trust the input from the client - or 
at least need to take a conservative approach.

So in short, I'd like to see the capability for agents to indicate a 
length as well as using chunking, whether that's an estimate or not.  
Perhaps a max size would be most useful in most cases (e.g. will not 
exceed X bytes), or has been mentioned, a range.

David Morris wrote:
>
> On Mon, 20 Oct 2008, William A. Rowe, Jr. wrote:
>
>   
>> Jamie Lokier wrote:
>>     
>
>   
>>> In some circumstances you may be able to refine the estimate as the
>>> message is being transmitted.
>>>
>>> Chunk extensions ("chunk-extension") would suit that:
>>>
>>>     1000;estimated-remaining=299000
>>>     (1000 bytes)
>>>     1000;estimated-remaining=298000
>>>     (1000 bytes)
>>>
>>> I don't know if chunk extensions break in the real world, though.
>>>       
>> Or, permit
>>
>>      1000;completed=25%
>>      (1000 bytes)
>>      1000;completed=25%
>>      (1000 bytes)
>>     
>
> Either % completed or estimated remaining requires computing an estimate
> of the total data to be transfered. I don't see a difference in the impact
> of either choice on a server. I think there are many cases where a
> generated result can be estimated to a 95%+ accuracy, but not to the exact
> size needed for content-length. There is no incentive to do it today, but
> if it could be utilized to improve the user's experience, many web
> application developers would be happy to do so. In addition, a jsp/php/asp
> engine could even watch the actual size generated for each request and
> recognize some pages with a small standard deviation in generated size.
> Use that value automatically.
>
> I'd rather see raw sizes from a recipient's perspective as I might
> be able manage resources better. A percentage is only useful for end
> user presentation without interpolating from the amount of data already
> received. Raw numbers make computation of a percentage trivial while more
> easily supporting other use cases.
>
> David Morris
>
>   

-- 
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com

Received on Tuesday, 21 October 2008 01:51:13 UTC