Re: Upload negotiation

Henrik Nordstrom wrote:
> tis 2008-04-08 klockan 13:44 +1200 skrev Adrien de Croy:
>
>   
>> Seems to me that getting around message - length / connection 
>> maintaining issues by using chunking so you can send a premature final 
>> chunk instead of terminating the connection is asking for a whole lot of 
>> pain.  It's an ugly hack.
>>     
>
> I disagree.
>
>   
>> For starters, there's then no way to tell the recipient that the 
>> completion was an abortive one.
>>     
>
> It's the recipient who asked for it to be aborted by sending an error
> instead of 100 continue.
>   
I guess that's my point.  If you look at many proxies or servers, where 
you have installable filter modules etc, then a lot of processes can 
exist inside this one "recipient who asked for it".  They may not 
necessarily know that a module upstream of them rejected the message.

I think it's necessary if we are to try and keep connections alive, to 
be able to signal an abortive end to message body transfer without 
closing the connection.  A chunk extension for this as you propose 
sounds like a good way to do this.

>> If there are several processes in an 
>> intermediary or end server that the data goes through before getting to 
>> the module that caused the client to abort, then you've got all manner 
>> of things may happen to that data which appears complete.
>>     
>
>   
>> That's where it would have been useful to support notifying an abortive 
>> end ( e.g. previously discussed negative 1 chunk length ) to a 
>> transfer.  Abort without closing.
>>     
>
> Then propose a chunk extension for that purpose.. I.e. something like
>
> 0; aborted
>
>   
>> All in all I think using chunked uploads is bad for many reasons, apart 
>> from the fact that it's probably poorly supported even in existing 
>> HTTP/1.1 infrastructure.  Using it so you can stop sending a body is 
>> even worse.
>>     
>
> I obviously don't share that opinion.
>
>   

evidently :)

Actually I think there are some places where chunked uploads are 
appropriate, but my personal opinion is that if at all possible, the 
length should be specified in Content-Length.  Unless a client is 
streaming some on-the-fly generated content up in an upload (that 
couldn't be spooled first), then it should always send a length.  Just 
like I think in SMTP all sending agents should indicate a SIZE in the 
MAIL FROM command.  It's just the most efficient way to implement 
size-based policy.

>> As I said, it's possible ("a client can") to have a connection followed 
>> by initiation of a large transfer without any notification of acceptance 
>> by the thing that will have to swallow the data or reject and disconnect.
>>     
>
> Yes.
>
> Same is also true for request headers, or URL lengths..
>   
sure, but not many serves would accept request headers + URL length over 
about 32kb.  I think the default for IIS is 16 kb.

I'd be surprised if any accept even 1MB.  It doesn't compare to a large 
upload.

>> If you want proof of the problem, try uploading a 100MB file across a 
>> slow WAN through a proxy that requires auth to a poorly-connected server 
>> that requires auth (a fairly common scenario actually).  Basically 
>> impossible because depending on the auth method, that 100MB may have to 
>> be sent up to 6 times.
>>     
>
> Well, ignoring connection oriented auth the client SHOULD either use
> chunked encoding or close the connection, aborting the request when
> seeing the challenge. 
right - but then that only works with HTTP Basic auth and Digest.  NTLM 
users are out in the cold because their auth method requires a 
persistent connection.  It's a lot easier to write "ignoring connection 
oriented" than it is to ignore all those customers.

It doesn't help either that many browsers don't support receiving any 
data whilst trying to send resource, so won't stop sending even if you 
do send back a 4xx.  I gues contravening your spec reference below.  I 
understand the market-leading browser does this.

> Clients SHOULD NOT send the whole body after
> receiving an error. This is already spelled out verbatim in the specs
> under "Monitoring Connections for Error Status Messages".
>
> This got screwed up the day someone decided that connection oriented
> authentication is a good thing in a message oriented protocol where each
> message is supposed to be self-contained... I won't even comment on what
> disaster that is to the process. 

I guess that's what stands HTTP apart from pretty much all other 
transfer protocols (SMTP, FTP, NNTP etc).  By optimising the protocol so 
much to minimise round-trips, we sacrificed certain protocol 
pleasantries that dealt with these sorts of situations.  When we then 
come up against the issues, we have to specify heuristics to get around 
it, and to deal with backward compatibility.  It makes me wonder what 
HTTP will look like in 30 years.

> The use of chunked encoding helps
> avoiding some of the disaster thankfully.
>
>   
>> Try an analogy - e.g. a fertiliser delivery company.
>>
>> the HTTP way:
>>
>> * The truck turns up and starts dumping fertiliser in your driveway
>> * you come out and scream at it.
>> * it stops
>> * you clean up
>> * maybe it comes back and dumps some more fertiliser in your driveway 
>> soon after
>>     
>
> I see where you are coming, but you got the roles the wrong way around
> for it to match HTTP. A more proper analogy would be a person trying to
> place an order, or someone wanting to send a package at the post office.
>
> Example based on the order. Note that there is a twist here in that if
> you hand over the order form and there is a problem you can't get it
> back and have to fill out a new copy.
>
>   
I guess I was trying to convey the issue of the work involved and 
resources expended by all parties (therefore waste) in even fronting up 
with the request.

I like the sending the package at the post office analogy, but try it 
with 2000kg of sand, and you live on an island and see what the 
logistics are like.  Esp when the small local post office wants to see 
some id, which you have to go back home for, and then front up with 
another 2000kg of sand when you come back again with your ID, then they 
say they don't accept packages that big.

Or worse still then they refer you to the central post office on the 
mainland, which is only accessible by a small rope swing bridge which 
you have to carry this 2000kg over in small bags.  Then they send you 
home for ID as well.

And even with Digest auth and 2 entities wanting auth, you are looking 
to transfer 4 times or drop connections.  I guess that's the beauty of 
digest - you don't need to maintain the connection.

But this really doesn't help so many people who are stuck with NTLM for 
any number of reasons.

And 100 continue is just like turning up to the post office with this 
sand on the truck with the removal guys you are paying $40/hr for, and 
if someone doesn't come out of the post office within 2 seconds, you 
start unloading.  And once you start unloading, you don't stop until 
it's unloaded (what many browsers do).  100 Continue + Expects is the 
same except you toot the horn outside before waiting for 2 seconds then 
starting unloading.

A normal person would ring the post office first :)

So what comes out of all of this?

a) agents should use digest instead of NTLM
b) agents should notice rejections whilst they are sending (already 
specified but not universally observed)
c) on rejection during an upload, agents should disconnect and retry 
with credentials etc.  I can't really endorse using chunked uploads 
until there is a method to signal abortive end to a transfer, and also 
to signal size in advance if known for policy purposes.

which leaves us in the following predicament:

i) what to do about the zillions of people reliant on NTLM - seems like 
the proposal is to basically ignore them?  Or worse still try and 
educate them about using Digest :)
ii) what to do about chunked uploads to fill in the gaps (reporting size 
and signalling abortive end without closing). 

The size thing is a big problem for many sites if we move to chunked 
uploads.  Pretty much every site has a specification for max size POST 
data.  Having to receive it all in order to decide it's too big is 
incredibly wasteful. 

Many people around the world still have to pay for data per MB, and/or 
are on slow connections.

iii) praying that as a proxy we never get large chunked requests to 
process for upstream HTTP/1.0 agents.  In many cases there is no viable 
solution to this problem, since the proxy has to spool the entire 
request before submitting anything to the server - it can't send 
anything through without a Content-Length.  Actually for this reason and 
ii) above, I'd be a keen fan of using Content-Length AND chunking for 
client message bodies (even though it's utterly prohibited in the 
spec).  At least then the proxy could stream stuff through to the HTTP 
1.0 next hop as it received it (after of course first establishing that 
the next hop actually IS HTTP/1.0 by having its first request bounced).

Regards
Adrien


> - Person fills out his order form and goes to the counter where the
> order is supposed to be placed.
> - If the customer is polite he asks if he may place the order before
> handing over the order form, or if in a hurry he hands over the order
> form immediately hoping for the best.
> - Gets told that he need to have some proof of his customer number in
> order to place the order. 
> - Goes away to fetch his customer card proving his identity in the
> process.
> - Back at the counter again repeating the process
>
> Or another version where everything is in order
>
> - Person fills out his order form and goes to the counter where the
> order is supposed to be placed.
> - The customer is polite and asks if he may place the order before
> handing over the order form
> - The processing agent is currently busy processing another order and
> doesn't answer immediately.
> - The person hands over the order form when the processing agent says
> it's ready, or if he suspects the processing agent or the communication
> with it is oldfashioned and outdated and doesn't indicate readiness.
>
> Regards
> Henrik
>
>
>   

-- 
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com

Received on Tuesday, 8 April 2008 10:40:21 UTC