Re: websocket HTTP response parsing

Ian Hickson wrote:
> On Sun, 6 Jul 2008, Julian Reschke wrote:
>>>> As far as I can tell, it depends on specifics of the server 
>>>> implementation. I can easily imagine cases where it's the HTTP 
>>>> server that replies to the upgrade request, and control to the new 
>>>> server component passes only afterwards. In that case it could be 
>>>> hard to produce the specified byte sequence.
>>> Yes, it may require careful work when the servers are updated.
>> So why make it harder than necessary?
> 
> We need a handshake that can guarantee that we've contacted a WebSocket 
> server and not some other server that's being tricked into sending data 
> that looks like the handshake. For that purpose, we use the headers that 
> come before the server has to echo anything, and we make that as long as 
> possible (and as unique as possible) by moving both those headers there.

I don't understand this point. Could you (or somebody else) please 
elaborate?

>>>> Also, did you consider the impact of intermediates in the request 
>>>> path?
>>> You mean, like proxies? Sure, the spec defines how to handle those.
>> What if they somehow modify the reason phrase? It would be outside the 
>> control of the server, and communication would break.
> 
> Any proxy that did that would be deeply violating HTTP rules and would 
> also be breaking TLS tunnels.

Which HTTP rule would be violated by a proxy rewriting the reason phrase?

> In any case, mcarter implemented and tested this with several proxies, as 
> I understand it, to make sure this would work.
> 
> 
>>>> But it's more complex than it needs to be
>>> It's a stream of bytes. It's longer than absolutely necessary, sure, 
>>> but how can it be more complex than necessary?
>> Ok, it's longer. What for?
> 
> Why shouldn't it be?

I don't see the point in making a protocol exchange longer than 
necessary, nor the point in putting plain english text into it, and make 
that relevant for the protocol. Let's just agree that we disagree on this.

> It seems better to use whatever HTTP uses in its descriptions than go 
> ahead and make up our own stuff for no reason. If we had a good reason to 
> use something else then sure, but here we really don't. Would you rather 
> have the status text be something else, like "101 WebSocket Handshake"?

Actually yes -- because it would ensure that somebody deploying this 
actually has control over the reason phrase, which may not always be true.

I'd prefer though to use HTTP semantics (making the text of the reason 
phrase irrelevant), or to use an empty reason phrase.

>>> Why is using the HTTP spec's recommended text a problem?
>> The HTTP spec doesn't recommend any specific text. The reason phrase is 
>> for human consumption only. For instance, there are servers known to 
>> vary it by Accept-Language or locale.
> 
> The section defining code 101 is "10.1.2 101 Switching Protocols". So 
> that's what we use here. Seems reasonable to me.

It's one reasonable string, but again (and again...): the HTTP spec does 
not give it any semantics, and servers may vary in what they send, for 
instance because of the locale of server, or Accept-Language.

>> There's simply no point in putting it in, so why do it?
> 
> This seems like an unbelievably minor point of basically no importance 
> whatsoever. Am I missing some key way in which this actually matters? If 
> not, then it seems pointless to be arguing over it.

In addition to the reasons I gave (and that you don't seem to care 
about), it's a strange way to design a protocol by making a plain 
english text fragment part of the exchange.

>>> Could you describe the actual practical problem you are concerned 
>>> about?
>> Could you describe the actual practical problem because of which you're 
>> putting in a reason phrase?
> 
> I'll take that as a "no", and assume there is no actual problem, then. 
> (I've described the reason for the handshake earlier in this e-mail.)

I've been repeating it, but it seems you don't get it. So, for the last 
time:

- the reason phrase has no semantics in HTTP, so it's a bad idea to make 
something look like HTTP but then make the exact byte sequence in the 
reason phrase important

- servers/libraries may not allow sending arbitrary reason phrases

- servers/libraries may vary reason phrase based on locale information, 
so you may be testing something in the US, and it works, and later it 
will fail when the software is deployed somewhere else

- intermediates could rewrite the reason phrase

BR, Julian

Received on Monday, 7 July 2008 07:16:17 UTC