Re: 100 Continue and Expects from Mark Pauley on 2010-04-01 (ietf-http-wg@w3.org from April to June 2010)

From: Mark Pauley <mpauley@apple.com>
Date: Thu, 1 Apr 2010 15:36:31 -0700
To: Adrien de Croy <adrien@qbik.com>
Cc: ietf-http-wg@w3.org
Message-Id: <B3CBD4D8-47FE-45B0-A6D7-1032B7523127@apple.com>
Adrien: technically you're right.  With HTTP you can't know that you need to authenticate until after you've tried.  A server could legitimately ask every nth request to authenticate, or every request with a payload of more than k bytes to authenticate, or only require authentication between 10am and 2pm or whatever.  You must try the response exactly as you want it, which is why 100-continue is so important.  It explicitly allows the server to tell the client to authenticate based only on the HTTP header.  Sorry if I'm being pedantic here, I'm probably the biggest noob on the list so I'm just spelling my understanding out explicitly.

Practically however: I've seen that Microsoft proxy servers and web servers that use NTLM authentication always ignore payload sent with the initiation of the NTLM authentication.  In essence, the first request isn't really HTTP because the client really expects the server to respond only with a 4xx message.

Of the proxy sequence:

*A: Try request
B: 407 response (list of available authentication methods)
**C: Try request (initial NTLM auth nonce / salt)
D: 407 response ( NTLM re-salt response )
*E: Try request (full NTLM challenge)
F: Success, response from endpoint

and the non-proxy sequence:
*A: Try request
B: 401 response (list of available authentication methods)
**C: Try request (initial NTLM auth nonce / salt)
D 401 response ( NTLM re-salt response )
* E: Try request (full NTLM challenge)
F: Success response.

You can see that we must send the full payload with all of the requests that are marked with a *, but we have found that we can optimize the requests that are marked with a ** by clearing the payload.  This is because according to the broken NTLM semantics, the intitial NTLM request can't result in anything but another 4xx response (and on the same TCP connection at that).  Hogwash!  We can't know what nasty state the server is in, so we should have to expect that the server may respond with a 200.. But since Microsoft is the doorkeeper here, and their own clients appear to safely follow this twisted logic, CFNetwork also saves one upload by clearing the payload.  I do not know if any pieces of the NTLM handshake interoperate with 100-continue, as I haven't tried.



On Apr 1, 2010, at 3:08 AM, Adrien de Croy wrote:

> 
> keep in mind also, it's possible on a single TCP connection to be challenged for credentials multiple times.
> 
> It's a common customer requirement to enable some sites for some users, and other sites for other users.  To establish who a user is requires auth.
> 
> A not too uncommon scenario through a proxy follows:
> 
> 1. a user connects, requests a site
> 2. site not restricted, so request granted
> 3. on same connection, user requests a restricted site which only members of group A are allowed to see
> 4. Proxy sends 407
> 5. user fills in credentials of a user in that group, request is granted
> 6. on same connection, user requests a restricted site which only members of group B are allowed to see, and the user they logged in as isn't a member
> 7. Proxy sends 407
> 
> Also if a user chooses credentials that don't grant the right to view the page, a common response is another 407 - give the user another chance.  Many browsers will give up after a few tries.
> 
> Proxy policy for what needs to be authed can vary on any parameter (even method, or a header or even time of day).  I had a customer yesterday who wanted to block POSTs to a specific site unless authed but only during work hours.
> 
> So there's no guarantee that ANY request which is not the actual request you wish to make will elicit the same response from a proxy, or be reliably usable to pre-establish necessary credentials.
> 
> Adrien
> 
> On 1/04/2010 11:59 a.m., Mark Pauley wrote:
>> You're right, and we've identified these sort of issues as well.  CFNetwork (and any HTTP client in general) can't assume that they'll be given a 407, even if we know that this host has sent us one before.  Moreover, using a HEAD to 'prime' the auth also doesn't work, nor does anything similar work.  Sometimes you have actually try the full request, because HTTP allows for servers to be as pernicious as they please.
>> 
>> In the end, the only time CFNetwork zeroes out the POST's length is in the case that we've already gotten a 4xx, and we are starting the NTLM auth dance, because there is no way that NTLM can just decide to not give us a 4xx once we give it the NTLM initiation in response to the initial 4xx.
>> 
>> 
>> On Mar 31, 2010, at 3:48 PM, Adrien de Croy wrote:
>> 
>>   
>>> we saw the issue with IE + WinGate Proxy + NTLM going through to a webmail site.
>>> 
>>> WinGate was configured to require auth, but credentials get inherited, and policy can be used to specify which sites need auth or not.  So not every connection was being asked to auth (407).
>>> 
>>> IE was obviously working on the assumption that it would ALWAYS be challenged with a 407 on new connections.
>>> 
>>> Customers complained that they couldn't reliably post messages, the webmail would randomly break.  We tracked it down to the 0 length POST on a new connection.  IE thought it would get a 407, but it didn't, and the proxy passed the 0 length post through to the webmail server, which barfed on it, with the side-effect of the browser losing the email the customer had just written.
>>> 
>>> I complained about this to MS, but I think they still think it's a good idea.
>>> 
>>> Our hack fix was to check UA, 0 length etc, and if we get one, arbitrarily send a 407 back.  It's really ugly and I wish to death that MS would pull this broken behaviour.
>>> 
>>> Part of our selling point is that people can have policy which allows for access to some sites to be anonymous, other sites need auth etc - the admin chooses when and what needs auth.  There's no way the browser can predict it will get a 407 or not, so to send a 0-length POST is suicidal.
>>> 
>>> On 1/04/2010 11:29 a.m., Mark Pauley wrote:
>>>     
>>>> CFNetwork on Mac OSX will send a zero-length post as the first leg of the 3-way auth for NTLM after it has received the initial 4xx response.
>>>> 
>>>> CFNetwork ought to soon allow clients to specify Expects and handle the 100 Continue properly to reduce bandwidth on any requests with a large payload, though I can't say when 'soon' will be.  Unfortunately, the NTLM case really is dictated by Microsoft.  Their own behavior shows that they believe it's okay to send a POST with a zero length for the initial NTLM transaction, and since Microsoft is by far the largest producer of NTLM-using server software, CFNetwork just ape's IE.  What about the zero-length post breaks HTTP?  It seems like a POST that both has length 0 and Expects: 100-continue is silly, and the client ought to be able to handle either a 100-continue followed by standard HTTP response or simply just a standard HTTP response.  Garbage in, garbage out.
>>>> 
>>>> 
>>>> On Mar 29, 2010, at 5:42 PM, Adrien de Croy wrote:
>>>> 
>>>> 
>>>>       
>>>>> I did a review of what browsers do in some cases like this (e.g POST + auth/NTLM) a while back.
>>>>> 
>>>>> Their behaviour varies.  For instance IE (even current) if it thinks it will get an auth challenge, will submit initial POSTs with a content-length 0 (this breaks a heap of stuff).
>>>>> 
>>>>> FF and Chrome will send the whole POST body as many times as required to get through auth.
>>>>> 
>>>>> Server behaviour varies also.  IIS5 for instance will send a 100 continue on all POST requests regardless of whether they contain Expects.  In fact I've never seen a browser send Expects.
>>>>> 
>>>>> I wrote a I-D partly about this a while back, but it was somewhat misguided, based on my misunderstanding of the Expects mechanism.
>>>>> 
>>>>> In the end, I think that's the biggest problem with Expects - people expect (pardon the pun) more from it than it delivers.  The reason the history of this list contains numerous discussions about Expects, is because of common misinterpretation.
>>>>> 
>>>>> The wording in the RFC implies that Expects and 100 Continue can be used to avoid sending the body of a request.  It omits to mention that if indeed you do want to avoid this (e.g. when you get a 3xx or 4xx), you either need to disconnect and re-try, or complete the message some other way (e.g. use chunking and prematurely complete it with a 0 chunk).
>>>>> 
>>>>> I would really hate to see anyone adopt the 0 chunk approach - it's just as bad as sending Content-Length: 0 when the browser thinks the proxy or server will send an auth challenge.  Browsers second-guessing proxies / servers is not a good option, and when the browser gets it wrong, things get ugly.  We had to write a hack into WinGate to cope with the IE misbehaviour (WinGate only sends an auth challenge when policy dictates auth is required - the browser cannot possibly predict this, and a 0 length post is valid, but breaks websites).  Sending a 0 length chunk tells the proxy the message is complete.  It could then go through many stages of processing that is really undesirable.  There's no way to signal an abort and maintain a connection.
>>>>> 
>>>>> In the end, the best option IMO is to send the whole thing each time (as per FF and Chrome - sorry don't know what Opera does).  It can be hideously painful with chained connection-oriented auth over a slow link.
>>>>> 
>>>>> So actually there's no clean solution to this problem.  I proposed a while back a kind of HEAD command for POST to establish a path with credentials before posting a body.  But in the end, any proper solution to this problem will be a significant protocol change to HTTP (since HTTP is designed for intermediaries to be able to work with unknown methods), which means we'll be living with this problem for a long time yet.
>>>>> 
>>>>> As for NTLM, it's not going away either.  So much as people don't like it because it's not HTTP compliant auth - it's the reality we are stuck with.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Adrien
>>>>> 
>>>>> 
>>>>> On 30/03/2010 12:22 p.m., Jamie Lokier wrote:
>>>>> 
>>>>>         
>>>>>> Henrik Nordström wrote:
>>>>>> 
>>>>>> 
>>>>>>           
>>>>>>> sön 2010-03-28 klockan 15:20 +0100 skrev Jamie Lokier:
>>>>>>> 
>>>>>>> 
>>>>>>>             
>>>>>>>> With certain types (Microsoft) of authentication
>>>>>>>> 
>>>>>>>> 
>>>>>>>>               
>>>>>>> You forgot to add "which is not true HTTP authentication schemes". That
>>>>>>> family of auth schemes makes many assumptions which is opposite the
>>>>>>> intentions of the HTTP specifications so using them as an example when
>>>>>>> trying to understand the wording of the specification text is not valid.
>>>>>>> 
>>>>>>> Still interesting when talking about what should be said as they are a
>>>>>>> reality and something HTTP has to deal with today, but not when trying
>>>>>>> to understand why RFC2616 is written in a certain manner.
>>>>>>> 
>>>>>>> 
>>>>>>>             
>>>>>> Agreed, Microsoft's version is not standard HTTP.
>>>>>> 
>>>>>> Regardless of why, it's important to recognise that clients have the
>>>>>> option to send the whole request body if they decide to, despite one
>>>>>> possible reading of that section of RFC2616 that they must abort.
>>>>>> 
>>>>>> -- Jamie
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>           
>>>>> 
>>>>>         
>>>> 
>>>>       
>>>     
>>   
> 
> -- 
> Adrien de Croy - WinGate Proxy Server - http://www.wingate.com
> 
>
Received on Thursday, 1 April 2010 22:37:05 UTC