Re: Upload negotiation from Adrien de Croy on 2008-04-08 (ietf-http-wg@w3.org from April to June 2008)

From: Adrien de Croy <adrien@qbik.com>
Date: Wed, 09 Apr 2008 00:54:44 +1200
To: Henrik Nordstrom <hno@squid-cache.org>
CC: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <47FB6B14.9070405@qbik.com>
Henrik Nordstrom wrote:
> tis 2008-04-08 klockan 22:41 +1200 skrev Adrien de Croy:
>   
>> right - but then that only works with HTTP Basic auth and Digest.  NTLM 
>> users are out in the cold because their auth method requires a 
>> persistent connection.  It's a lot easier to write "ignoring connection 
>> oriented" than it is to ignore all those customers.
>>     
>
> Using the chunked method works just fine for NTLM.
>
>   
we just need to get some browsers to support it :)

>> It doesn't help either that many browsers don't support receiving any 
>> data whilst trying to send resource, so won't stop sending even if you 
>> do send back a 4xx.
>>     
>
> That's a bug/misfeature. Not someting we can help by adding more
> features to the protocol.
>
>   
>> I gues contravening your spec reference below.  I 
>> understand the market-leading browser does this.
>>     
>
> So someone should beat them with a big stick then...
>   
I'll have to leave that to someone closer to them....

>> I guess that's what stands HTTP apart from pretty much all other 
>> transfer protocols (SMTP, FTP, NNTP etc).  By optimising the protocol so 
>> much to minimise round-trips, we sacrificed certain protocol 
>> pleasantries that dealt with these sorts of situations.  When we then 
>> come up against the issues, we have to specify heuristics to get around 
>> it, and to deal with backward compatibility.  It makes me wonder what 
>> HTTP will look like in 30 years.
>>     
>
> The day we can forget HTTP/1.0 things will stabilize considerably.
>
> But in 30 years I expect that HTTP has been pretty much replaced by
> something new, more targeted for interactive transfer of large amounts
> of data.
>
>   

heh - sounds like FTP...

>> I like the sending the package at the post office analogy, but try it 
>> with 2000kg of sand, and you live on an island and see what the 
>> logistics are like.  Esp when the small local post office wants to see 
>> some id, which you have to go back home for, and then front up with 
>> another 2000kg of sand when you come back again with your ID, then they 
>> say they don't accept packages that big.
>>     
>
> Only if you refuse to do what the specs says you should do.
>
> Specs says you should contact the post office, say I want to send 2000kg
> of sand and then wait for confirmation. The twist is that if there is
> neither a confirmation or rejection within a reasonable time frame and
> you don't know if the post office or your communication channel to the
> post office supports confirmation then you should not wait forever and
> instead assume it's acceptable if no response is seen within a
> reasonable timeframe.
>
> Specs absolutely do not say that you SHOULD drive to the post office
> with all that 2000kg of sand blindly assume they will handle it for you
> by default.
>
>   
>> Or worse still then they refer you to the central post office on the 
>> mainland, which is only accessible by a small rope swing bridge which 
>> you have to carry this 2000kg over in small bags.  Then they send you 
>> home for ID as well.
>>     
>
> Stop assuming that you have to carry those 2000kb, it's a false
> assumption. You do not have to. You can select to but it's your own
> choice.
>
>   
Coming from the proxy side, I see what the clients do, and I either see 
Firefox sending the whole thing each time, or IE trying a POST with 
Content-Length:0 first if it thinks I will bounce it with a 407.  Since 
POST with Content-Length:0 is not invalid, this creates several problems.

But if we put that aside as 2 examples of bad behaviour and move onwards...

>> And even with Digest auth and 2 entities wanting auth, you are looking 
>> to transfer 4 times or drop connections.  I guess that's the beauty of 
>> digest - you don't need to maintain the connection.
>>     
>
> Just follow the specs and you will be fine.
>
> With NTLM it's harder to follow the specs, but thats not the specs
> fault. NTLM should have been implemented at the message level, not
> connection. Digest is one source of inspiration on how a such session
> oriented authentication scheme may look like without tying it to the
> transport.
>   
I've been looking more into this... it does have some difficulties, 
seemingly keying on the URI, and requiring the server to maintain an 
independent cache of credential handles.

All doable, but quite different from session-oriented.  It's easy to see 
why session-oriented was chosen.  Makes association of interim / 
temporary credential handles with a user trivial in most cases.

>> But this really doesn't help so many people who are stuck with NTLM for 
>> any number of reasons.
>>     
>
> Until something better can replace NTLM/Negotiate.
>   

>> And 100 continue is just like turning up to the post office with this 
>> sand on the truck with the removal guys you are paying $40/hr for, and 
>> if someone doesn't come out of the post office within 2 seconds, you 
>> start unloading.
>>     
>
> Now you are just too pessimistic about 100 Continue. Most web servers
> out there is HTTP/1.1 and do send 100 Continue (even old ones just
> implementing RFC2068). And the major commercial proxy servers it
> HTTP/1.1 as well. Squid is not, but that's another story..
>
>   
I guess that's the thing.  100-continue -> timeout ->start sending is 
the optimistic option.

>> And once you start unloading, you don't stop until 
>> it's unloaded (what many browsers do).  100 Continue + Expects is the 
>> same except you toot the horn outside before waiting for 2 seconds then 
>> starting unloading.
>>     
>
> I don't view it that way. I would view it that you either first send the
> request in a separate car or phone it in. Then at a suitable time send
> the truck with all the sand.
>
>   
>> A normal person would ring the post office first :)
>>     
>
> And is what expect 100-continue does. You should view the heuristics in
> the specs as what you do if the post office doesn't care to answer the
> phone.
>
>   
OK.

Browser authors needs to work on heuristics then.  Basing timing from 
RTT to a local proxy when a remote server may be poorly connected means 
timeouts will happen more often.


>> So what comes out of all of this?
>>
>> a) agents should use digest instead of NTLM
>>     
>
> Yes, or a replacement along the same lines (session oriented auth at the
> message level, not connection oriented)
>
>   
>> b) agents should notice rejections whilst they are sending (already 
>> specified but not universally observed)
>>     
>
> YEs.
>
>   
>> c) on rejection during an upload, agents should disconnect and retry 
>> with credentials etc.  I can't really endorse using chunked uploads 
>> until there is a method to signal abortive end to a transfer, and also 
>> to signal size in advance if known for policy purposes.
>>     
>
> Well, I don't consider the intermediary issues you push on that
> critical. Intermediaries should follow 'b' so you know that any data
> seen after a error response is to be considered garbage. It shouldn't
> even be forwarded, so you should not take any actions based on that
> data.
>   

OK, fair point.. chained filters should notice outbound 4xx as well.

I still feel uncomfortable about aborting without explicitly marking it 
as such.  It may be implicit, but if not explicitly marked, that 
precludes any possible future use-case where a 4xx response could 
possibly not invalidate the data.  I can't think of any such case, but 
I'm not quite ready to stake that it will never eventuate.

Could make a SHOULD level requirement that user agents sending chunked 
requests that choose to abort them SHOULD include a "0 ; aborted" chunk 
extension.  Then at least the possibility exists that it can be noted as 
an explicitly aborted termination by anything that recognises that 
without having to assume so based on previous protocol traffic.  That 
would avoid boundary/race conditions (e.g. sent final chunk before 
receiving 4xx/5xx/3xx?)

>> which leaves us in the following predicament:
>>
>> i) what to do about the zillions of people reliant on NTLM - seems like 
>> the proposal is to basically ignore them?  Or worse still try and 
>> educate them about using Digest :)
>>     
>
> Find a better replacement for NTLM that actually works with the HTTP
> specifications, then get vendors to accept it. Not technically very
> hard, and takes about the same time to roll out as any other noticeable
> change.
>
>   
>> ii) what to do about chunked uploads to fill in the gaps (reporting size 
>> and signalling abortive end without closing). 
>>     
>
> ?
>
> Signalling the abort differently than just "end of request" is purely
> optional. Whoever you are talking to SHOULD have already realised the
> request has failed.
>
>   
>> The size thing is a big problem for many sites if we move to chunked 
>> uploads.  Pretty much every site has a specification for max size POST 
>> data.  Having to receive it all in order to decide it's too big is 
>> incredibly wasteful. 
>>     
>
> True. And to address this one can introduce an advisory header carrying
> the expected size of the transmitted entity. It has to be different from
> Content-Length for protocol reasons.
>   
OK, I'd be happy with anything that can tell me the length.  I'm not 
aware of why it can't be Content-Length for client requests (it's more 
obvious to me when it comes to server responses), but if it could be 
converted to a Content-Length header for relaying upstream that solves 
spooling and flow-control issues.

For instance there are 3 boundary conditions for Content-Length and 
chunking.

Content-Length tag matches amount transmitted with chunking.  Everything 
is ok
Content-Length is bigger.  Send aborted or other problem discard.
Content-Length is smaller.  Client was lying.

>   
>> Many people around the world still have to pay for data per MB, and/or 
>> are on stow connections.
>>     
>
> The more important for them that their clients and any proxies they use
> actually follow specs then.
>
>   
so we still have a long way to go with browser behaviour then.

>> iii) praying that as a proxy we never get large chunked requests to 
>> process for upstream HTTP/1.0 agents.
>>     
>
> Proxies should respond with 411 in such case, making communication
> revert to "HTTP/1.0 compatible" with all the problems that have..
>
>   

I guess so, or at least to retry without chunking.  I guess the 411 
would come from the HTTP/1.0 server?  How to tell that the server is 
HTTP/1.0 if it doesn't recognise Transfer-Encoding, and just sees chunk 
wrappers as part of the content anyway?  Would it be purely lack of 
Content-Length?  Or would you have to rely on cached knowledge?

>> In many cases there is no viable 
>> solution to this problem, since the proxy has to spool the entire 
>> request before submitting anything to the server
>>     
>
> No, it should not. Proxies should use the same rules on when chunked is
> acceptable as any other clients, and reject to forward the message if
> they think it won't work out. That's what 411 and 417 is about,
> downgrading the protocol when needed. Remember that proxies act as both
> servers and clients and have to fulfill both sets of requirements.
>   
I guess this means that proxies have to start caching details of servers 
then.  Problem is that caching things like HTTP versions etc unlike 
normal caching in HTTP doesn't have the benefit of explicit cache 
support - specified expiries, dates, etc.  There's nothing governing 
that caching.

Thanks for your explanations.

Adrien

> Regards
> Henrik
>
>   

-- 
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com
Received on Tuesday, 8 April 2008 12:54:13 UTC