[whatwg] WebSocket feedback from Ian Hickson on 2008-11-17 (public-whatwg-archive@w3.org from November 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 17 Nov 2008 11:45:25 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0811171030580.1237@hixie.dreamhostps.com>
On Fri, 11 Jul 2008, Mike Wilson wrote:
>
> Blocking I/O on the main thread is ok if it's possible to specify a 
> timeout for the I/O operation, see:
> 
> http://www.openajax.org/runtime/wiki/Synchronous_XHR_Enhancements
> 
> and if the UA'a user interface is kept responsive (running animated 
> GIFs, repainting UI etc) and allows the user to abort the blocking 
> operation (f ex as a new use of the Stop button), see:
> 
> http://www.openajax.org/runtime/wiki/Browser_Unresponsive_Mode_Enhancements

We're avoiding blocking anything on the main thread on principle these 
days.


On Tue, 22 Jul 2008, Shannon wrote:
>
> In order to understand this issue better I did some preliminary research 
> into how HTTP and common implementations currently support the five 
> primary requirements of the WebSocket/TCPSocket proposal; namely 
> persistence, asynchronism, security, shared hosting and simplicity. 
> After reading http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html I'm 
> starting to suspect that both systems can be fully implemented without a 
> new connection protocol.
> 
> Firstly, according to rfc2616 "In HTTP/1.1, persistent connections are 
> the default behavior of any connection."

This is basically a lie, for what it's worth. Browsers have found they 
can't do pipelining due to proxies.


> The other thing about persistent HTTP/1.1 connections is that they are 
> already asynchronous. Thanks to pipelining the client may request 
> additional data even while receiving it. This makes the whole websockets 
> protocol achievable on current HTML4 browsers using a simple application 
> or perl wrapper in front of the service ie:
> 
> service <--> wrapper <--> webserver (optional) <--> proxy (optional) 
> <--> client
> 
> a simple pseudo-code wrapper would look like this:
> 
> wait for connection;
> receive persistent connection request;
> pass request body to service;
> response = read from service;
> response_length = length of response;
> send Content-Length: $response_length;
> send $response
> close request or continue
> 
> A threaded wrapper could queue multiple requests and responses.
> 
> In theory (as I have yet to perform tests) this solution solves all websocket
> goals:
> 
> Simple: Can use CGI (taking advantage of webserver virtual-hosting, security,
> etc...) or basic script wrapper

Without doing actual HTTP, which has a huge overhead per message (I mean, 
on the order of 2x to 10x overhead for typical short messages), I don't 
see how you could do this using just CGI.

Also, I don't see how you could have a persistent CGI script in this 
scenario.


> Persistent: HTTP/1.1 connections are persistent by default

This isn't true in practice.


> Asynchronous: Requests and responses can be pipelined, meaning requests and
> responses can be transmitted simultaneously and are queued.

Granted.


> Backwards-compatible: Should work with all common HTTP/1.1 compatible clients,
> proxies and servers.

Presumably by definition.


> Secure: To exploit a service you would require CGI or dedicated application.
> ISPs tightly control access to these. SSLis easy to implement as a tunnel (ie.
> stunnel) or part of  existing webserver.

If it's just HTTP then presumably it has HTTP's security characteristics.


> Port sharing: This system can co-exist with existing webserver/applications on
> same server using CGI, transparent proxy or redirection.

I don't really see how to implement this in a simple way client-side while 
sharing a port with a fully-fledged HTTP server, and I don't really see 
how to do this on a dedicated port without huge pain since then you'd have 
to implement a whole HTTP server to have a compliant implementation.


> Obviously some real-world testing would be helpful (when I find the 
> time) but this raises the question of whether websockets is actually 
> necessary at all. Probably the only part HTML5 has to play in this would 
> be to ensure that Javascript can open, read, write and close a 
> connection object and handle errors in a consistent manner. The 
> handshaking requirement and new headers appear to complicate matters 
> rather than help.

As far as I can tell, what you describe is just XHR, and that doesn't 
really do what we need, mostly due to the overhead and complexity problems 
described above.


On Tue, 22 Jul 2008, Philipp Serafin wrote:
> 
> I think the problem is that this definition of "asynchronous" is very
> narrow. Yes, you don't need to wait for a request to finish before you
> issue a new one. But you'd still be bound to HTTP's request/response
> scheme in general.

Also true -- you can't receive messages without sending them, in this 
scheme.


> However, web authors might want to employ other schemes as well, for
> example server-sided asynchronous notifications ("pushing"),
> client-sided notifications that don't need to be replied or requests
> that can be answered out-of-order. Things like this could be
> implemented easily on top of the current WebSocket proposal, but would
> be very complicated to do with HTTP.

Right.


> If desired, maybe we could add an API to XHR to control pipelining 
> though?

That would be an issue for the WebApps group at the W3C.


On Wed, 23 Jul 2008, Shannon wrote:
> 
> WebSockets uses HTTP so it is hardly immune to the request/response 
> behaviour of its underlying protocol (including the stream nature of 
> TCP).

Once a connection is established, the server can push packets without 
client-side involvement, so this seems false.


> Besides this statement appears to be based on the assumption that the 
> server MUST wait for additional client requests to send each "message". 
> However the specification allows the server to send "chunked" or 
> "multipart" data in a variety of ways so full asynchronous communication 
> is acheivable by making the response chunks part of one long HTTP 
> multipart response and allowing the javascript API to access the 
> incoming data while the response is incomplete.

This loses sight of the "simplicity" goal, IMHO. How would a CGI script do 
this while still listening for client messages? The CGI specification 
doesn't really cover that possibility as far as I can tell.


> I'm not advocating against WebSockets, just its current definition. In 
> particular it tries to solve things that HTTP/1.1 already handles.

I am not convinced of this.


> I believe we should be thinking of WebSockets as a Javascript API, not a 
> new communications protocol for the simple reason that HTTP is already a 
> very suitable and widely deployed protocol. What authors (especially 
> AJAX authors) are missing is a reliable way to use HTTP's existing 
> asynchronous connection support.

I'm not at all convinced that HTTP is what we need. As you point out, to 
get the right behavior, you have to hack it by making the server use a 
second level of encoding (chunking or multipart), and in addition it has 
to listen for additional requests, which seems very much like a misuse of 
HTTP's semantics (it's supposed to be stateless).


> Here are my issues with WebSockets as currently defined:
> 
> 1.) Request must have a <scheme> component whose value is either "ws" or 
> "wss"
> 
> The "scheme" should be HTTP(S). WebSockets should be the API.

If we used HTTP, I'd agree. We're not using HTTP though; HTTP isn't 
appropriate IMHO.


> 2.) The message event is fired when when data is received for a 
> connection.
> 
> What "data"? A byte, a line, a chunk, the whole response? The spec isn't 
> clear.

It seems extremely clear to me; see section 7.3.4.1.2 "Data framing". I 
don't really see how to make this less ambiguous.


> I'd also recommend adding a connection.read( max_bytes ) method 
> as used by Python and most languages to let the author receive bytes at 
> a frequency appropriate to the application (eg, a game might want to 
> frequently poll for small updates).

No need for polling, the server framing triggers events automatically.


> 3.) If the resulting absolute URL has a <port> component, then let port 
> be that component's value; otherwise, if secure is false, let port be 
> 81, otherwise let port be 815.
> 
> No, no, no! Don't let paranoia override common sense. Not all websocket 
> applications will have the luxury to run on these ports (multiple web 
> servers, shared host, tunnelled connections, 2 websocket apps on one 
> host, etc...).

I don't understand your objection.


> 4.) The whole handshake is too complex.
> 
> There are many firewalls, proxies and servers that legimately insert, 
> change, split, or remove HTTP headers or modify their order. This is 
> also likely if the service being provided sits on top of a 
> framework/server (such as Coldfusion/IIS). Also what happens if HTTP/1.2 
> is sent? These will break the WebSocket handshake as currently defined.

That's intentional. The whole point is to prevent anything unintentional 
from breaking things or being vulnerable. This isn't HTTP, it's the Web 
Socket Protocol, that happens to start like HTTP in order to allow a 
server to upgrade from HTTP if the author desires.


> 5.) URI parsing specification
> 
> The current proposal spells out the URI/path parsing scheme. However 
> this should be treated EXACTLY like HTTP so the need to define it in the 
> spec is redundant. It is enough to say that the resource may be 
> requested using a GET or POST request. Same with cookie handling, 
> authorization and other HTTP headers. These should be handled by the 
> webserver and/or application exactly as normal, there is no need to 
> rewrite the rules simply because the information flow is asynchronous.

I don't understand the objection here either.


> 6.) Data framing specification
> 
> Redundant because HTTP already provides multiple methods of data segment 
> encapsulation including "Content-Length", "Transfer-Encoding" and 
> "Content-Type". Each of these have sub-types suitable for a range of 
> possible WebSocket applications. Naturally it is not necessary for the 
> client or server to support them all since there are HTTP headers 
> explicitly designed for this kind of negotiation. The WebSocket should 
> however define at least one fallback method that can be relied on (I 
> recommend "Content-Length", "Transfer-Encoding: chunked" and 
> "Content-Type: multipart/form-data" as MUST requirements).

As noted above, HTTP's framing is inappropriate here.


> 7.) WebSockets needs a low-level interface as well
> 
> By "dumbing down" the data transfer into fired events and wrapping the 
> data segments internally the websocket hides the true communication 
> behind an abstract object. This is a good thing for simplicity but 
> extremely limiting for authors wanting to fine-tune an application or 
> adapt to future protocols. I strongly recommend that rawwrite() and 
> rawread() methods be made available to an OPEN (ie, 
> authenticated/handshaked) websocket to allow direct handling of the 
> stream. It would be understood that authors using these methods must 
> understand the nature of both HTTP and websockets. In the same way a 
> settimeout() method should be provided to control blocking/non-blocking 
> behaviour. I can't stress enough how important these interfaces are, as 
> they may one day be required to implement WebSockets 2.0 on "legacy" or 
> broken HTML5 browsers.

The plan is to offer binary blobs in a future version, and maybe other 
kinds of data too (e.g. multiplexed audio/video streams), but for now we 
should keep things simple.


> 8.) Origin: / WebSocket-Origin:
> 
> Specifying clients allowed to originate a connection is a disaster 
> waiting to happen for the simple reason that sending your origin is a 
> privacy violation in the same vain as the referrer field.

Only the hostname is included. If two hosts want to communicate, the least 
they can do is share each other's hostnames. This isn't a privacy 
violation in the least.


> Any open-source browser or privacy plugin will simply disable or spoof 
> this since it would allow advertising networks to track people by 
> ad-serving via websockets.

They could do far more than the header would ever allow them -- as soon as 
they've established the connection, they could send the whole URL over the 
socket, along with all cookies and everything!


> Such tracking undermines the security of anonymising proxies (as the 
> "origin" may be a private site or contain a client id).

Why would that site be communicating with this remote server then?


> Using origin as a required field essentially makes the use of "referrer" 
> mandatory. If a websocket wants to restrict access then it will have to 
> use credentials or IP ranges like everything else.

This has nothing to do with the server limiting origin and everything to 
do with the handshake preventing unexpected connections from being 
established, precisely to prevent sites from connecting to resources that 
would otherwise be vulnerable.


> 9.) WebSocket-Location
> 
> The scenario this is supposed to solve (that an application makes a 
> mistake about what host it's on and somehow sends the wrong data) is 
> contrived.

That's not the attack scenario at all. This is just intended to help 
prevent someone from connecting to an open port that isn't expecting this 
protocol.


> What's more likely to happen is that a server application has trouble 
> actually knowing its (virtual) hostname (due to a proxy, mod_rewrite, 
> URL masking or other legitimate redirect) and therefore NO clients can 
> connect.

That value is passed along with the request in a manner that is trivial to 
reconstruct, so this isn't a concern.


> 10.) To close the Web Socket connection, either the user agent or the 
> server closes the TCP/IP connection. There is no closing handshake.
> 
> HTTP provides a reliable way of closing a connection so that all parties 
> (client, server and proxies) know why the connection ended. There is no 
> reason for websockets to not follow this protocol and close the 
> connection properly.

Defining an explicit handshake for closing just means extra complexity on 
the server side to implement this, and extra complexity on the client side 
to handle errors in that handshake. Why bother? TCP/IP already has a 
closing handshake, why isn't that enough?


> In conclusion, the current specification of WebSockets re-invents 
> several wheels and does so in ways that are overly complex, error-prone 
> and yet seriously limited in functionality. The whole concept needs to 
> be approached from the position of making HTTP's features (which are 
> already implemented in most UAs) available to Javascript (while 
> preventing the exploit of non-HTTP services). I do not believe this is 
> difficult if my recommendations above are followed. I do not wish to be 
> overly critical without contributing a solution, so if there are no 
> serious objections to the points I've made I will put time into 
> reframing my objections as a compete specification proposal.

I don't really agree with any of these points, as explained above.


On Thu, 24 Jul 2008, Shannon wrote:
> 
> I found this in rfc2817 section 1:
> 
>   The historical practice of deploying HTTP over SSL3 [3] has
>   distinguished the combination from HTTP alone by a unique URI scheme
>   and the TCP port number. The scheme 'http' meant the HTTP protocol
>   alone on port 80, while 'https' meant the HTTP protocol over SSL on
>   port 443.  Parallel well-known port numbers have similarly been
>   requested -- and in some cases, granted -- to distinguish between
>   secured and unsecured use of other application protocols (e.g.
>   snews, ftps). This approach effectively halves the number of
>   available well known ports.
> 
>   At the Washington DC IETF meeting in December 1997, the Applications
>   Area Directors and the IESG reaffirmed that the practice of issuing
>   parallel "secure" port numbers should be deprecated. The HTTP/1.1
>   Upgrade mechanism can apply Transport Layer Security [6] to an open
>   HTTP connection.

With all due respect, this position isn't tenable. We need to be able to 
declare whether we expect encryption or not before the connection is 
established. There's a reason 2817 hasn't been implemented.


> I believe we should rule out both new ports in favour of upgrading a 
> port 80 connection to a WebSocket; however according to the same 
> document the WebSockets proposal does not follow the expected 
> client-side behaviour for doing so:
> 
> 3.2 Mandatory Upgrade
> If an unsecured response would be unacceptable, a client MUST send an 
> OPTIONS request first to complete the switch to TLS/1.0 (if possible).
> 
>       OPTIONS * HTTP/1.1
>       Host: example.bank.com
>       Upgrade: TLS/1.0
>       Connection: Upgrade
> 
> Nor does the WebSocket server supply a valid response:
> 
> 3.3 Server Acceptance of Upgrade Request
> As specified in HTTP/1.1 [1], if the server is prepared to initiate the 
> TLS handshake, it MUST send the intermediate "101 Switching Protocol" 
> and MUST include an Upgrade response header specifying the tokens of the 
> protocol stack it is switching to:
> 
>       HTTP/1.1 101 Switching Protocols
>       Upgrade: TLS/1.0, HTTP/1.1
>       Connection: Upgrade
> 
> Obviously this is referring to TLS however WebSockets is also a protocol 
> switch and should therefore follow the same rules.

I refer you to the requirement that implementing the server side in a 
conforming fashion should require no more than a few dozen lines of code.


> I understand the reluctance to use a true HTTP handshake (hence the 
> ws:// scheme and alternate ports) however I think the claims of added 
> complexity on the server end are exaggerated (I say this as somebody who 
> has written a basic standalone webserver). It seems to me we're only 
> looking at required support for:
> 
> * Validating and parsing HTTP headers (that doesn't mean they are all 
> understood or implemented, simply collected into a native 
> structure/object/array)
>
> * Handling (or simply pattern-matching) the Version, Upgrade and 
> Connection headers
>
> * Adding a Content-Length header before each message sent to the client 
> and/or "chunk encoding" variable-length messages
>
> * Sending and respecting the "connection close" message
>
> * Sending "not implemented", "not authorised" and error status messages 
> as needed.
> 
> Currently WebSockets requires practically all of these features as well, 
> except that it implements them in non-standard fashion - effectively 
> making asyncronous delivery via existing infrastructure (ie: CGI) a 
> potentially more difficult and error-prone affair. In fact as it stands 
> I would say the current proposal rules out both CGI and proxy support 
> entirely since it cannot handle the addition of otherwise valid HTTP 
> headers (such as Expires, X-Forwarded-For or Date) in the first 85 
> bytes.

I would want to see the code of your fully-conforming implementation of 
the server-side of this before really considering this. I simply don't 
believe it can be done in a few lines of code.


On Sat, 26 Jul 2008, Frode B?rli wrote:
>
> I think we should agree on which features that WebSockets need to 
> provide before deciding on a protocol or method of achieving the goals.

We did that a few years ago. :-)


> Basically I want these features from WebSockets:
> 
> 1. The server side script that generated the page can at any later time 
> raise any event on the client side.
>
> 2. The client side script can at any time raise any event on the server 
> side (meaning inside the script that initially generated the page).
>
> 3. It must work trough existing Internet infrastructure, including 
> strict firewalls and proxies.
>
> 4. It should also be possible to open extra websockets to other scripts 
> - possibly trough the XMLHttpRequest object.

Those are some requirements, though I don't really see why we would want 
to connect to the original page as opposed to a separate server for 
events. Ideally, the HTML pages would be static and fully cachable, with 
the processing ocurring separately.


On Fri, 1 Aug 2008, Harlan Iverson wrote:
> 
> I am an implementor of BOSH and interested in WebSockets as future 
> option for browser based XMPP connections. I think a useful feature of 
> BOSH is the ability to send a pause request to the server, which 
> effectively increases the amount of time that can elapse before it 
> considers a client timed out; a client then resumes by making a normal 
> request with the same session ID and the request ID incremented as 
> usual. This is useful/needed because BOSH is a sequenced protocol. 
> Importantly, it enables a use case of maintaining a 'persistent' 
> connection between page loads.
> 
> Is there any equivalent mechanism in WebSockets to produce a 
> 'persistent' connection between page loads? Combined with sessionStorage 
> this would be very useful for an application such as Facebook Chat.

There's no mechanism for making a Web Socket survive a page load 
currently, mostly because there's no mechanism for anything to survive 
page loads at all. If we address this, it would be by making a "lifeboat" 
feature to which objects could be assigned before navigation, so that 
navigation to a same-origin page would maintain these objects.


On Sun, 21 Sep 2008, Richard's Hotmail wrote:
> 
> My particular beef is with the intended WebSocket support, and 
> specifically the restrictive nature of its implementation. I 
> respectfully, yet forcefully, suggest that the intended implementation 
> is complete crap and you'd do better to look at existing Socket support 
> from SUN Java, Adobe Flex, and Microsoft Silverlight before engraving 
> anything into stone! What we need (and is a really great idea) is native 
> HTML/JavaScript support for Sockets - What we don't need is someone 
> re-inventing sockets 'cos they think they can do it better.
> 
> Anyway I find it difficult to not be inflammatory so I'll stop now, but 
> please look to the substance of my complaint (and the original post in 
> comp.lang.JavaScript attached below) and at least question why it is 
> that you are putting all these protocol restriction on binary socket 
> support.

On Sun, 21 Sep 2008, Richard's Hotmail wrote:
> 
> Look, I'm not sure exactly what problem you guys are solving with 
> HTML5's WebSockets but I wish you well. What I and *many* others are 
> looking for is native JavaScript support for Sockets a la mode de (SUN 
> Java Applets + Adobe Flex + MIcrosoft Silverlight) that for some strange 
> reason don't seem to be subject to the same imaginary obstacles that are 
> being discussed in that thread. Please explain what security 
> vulnerabilities et al that Adobe, SUN and Microsoft have foisted upon us 
> that the HTML5 people wish to spare us from.

On Sun, 21 Sep 2008, ddailey wrote:
> 
> My apologies for getting involved in a topic I confess to knowing very 
> little about (though I would like to be able to have direct 
> client-to-client communication for a variety of purposes including 
> gaming and social networking), but it seems like the question here is 
> "what does the approach you are advocating enable that the approach on 
> the table doesn't do?"  I understand that you are saying the approach 
> WHATWG and HTML5 WG have undertaken is flawed (and I acknowledge your 
> claim that lots of folks are doing it better), but I really don't see 
> what you are hoping to do that these folks (whose expertise in such 
> matters I would certainly be willing to defer to) will not enable? Are 
> you claiming, for example, that HTTP roundtrips from a server to each 
> client will be intrinsically too slow to support such applications as 
> gaming? If so, then it would seem that would be a concrete complaint 
> that the advocates of the current proposal could, in theory, respond to. 
> The history of the discussion referred to by the link, indicates that as 
> James says, the current proposal has undergone numerous revisions based 
> on input. Perhaps since you obviously care so much about it, you can 
> help the proposal to evolve into something which addresses your 
> concerns.

On Sat, 27 Sep 2008, Richard's Hotmail wrote:
> 
> The easiest way to do that is to point you towards the BSD Socket 
> documentation or, in the case of browser-based functionality, go to 
> http://java.sun.com/j2se/1.5.0/docs/api/index.html and look up 
> java.net.Socket. Now, you might like to sit there and ask me to justify 
> the need or desirability for each and every attribute and method, but 
> then you'd problably also claim that "AJAX long-polling does everything 
> we need already so why bother with sockets anyway"?
> 
> Connection Timeouts? Read Timeouts? KeepAlive? NoDelay? Peeking?
> 
> Perhaps - "Well we do that sort of stuff with HTTP headers, so there!"?
> 
> I am just asking why Sockets are being re-invented for html5, and in 
> such a restricted and watered-down fashion. If you guys/gals really like 
> to build an integer 7 bits at a time or "frame" UTF-8 then more power to 
> you; just please stop forcing every one else to perfoem the same 
> contortions. Please gives us a normal a Socket (UDP and Multicast too 
> please) Subject it to same-origin policy or whatever else is reqd.
> 
> There are Intranets and IPsec and all sorts of configurations that lend 
> themselves to just such functionality. But you say "It's fine for 
> gaming" others say "It's just fine for chat" what else could there be 
> eh?

On Sat, 27 Sep 2008, Kristof Zelechovski wrote:
>
> If you are in control of the server, you can simulate datagram sockets 
> with one-shot controlled sockets and multicast socket with a central 
> dispatcher that all interested parties can register with.  For access to 
> external services that already accept datagram packets only, a gateway 
> of some sort may be necessary indeed.  Services that respond to 
> multicast packets are never external so it is not an issue, except for 
> an internal burglar. And I wonder what comes next.  Would you like to 
> trace route from JavaScript?  Or perhaps some ARP poisoning stuff, or a 
> packet sniffer?  Come on.

On Mon, 29 Sep 2008, Philipp Serafin wrote:
> 
> I do not agree with Richard at all, but I have to play devil's advocate 
> here because I think such a simulation would be pretty useless.
> 
> After all, more or less the only situation you'd want to use UDP outside 
> a LAN is when TCP doesn't fit your needs, e.g. because the flow control 
> does more harm than good to your use-case or because your peers have not 
> enough processing power for a full TCP implementation. Simulating UDP 
> via TCP would pretty much combine the disadvantages of both protocols.
> 
> Also, you already need a full roudtrip to initiate a TCP connection, a 
> second one to perform the WS handshake and a third one to close the 
> connection. Data not taken into account. You can hardly repeat that for 
> every datagram you want to send.
> 
> That out of the way, I think the "structure in content" approach is 
> preferable because in the end it makes the whole feature easier to use 
> and accessible to a much broader range of web authors. Because a WS 
> stream has standardized metadata and delimiters, you can easily build a 
> generic framework that processes those parts for you.
> 
> This is especially important if you DON'T have full control over the 
> server, which I believe is the case for the majority of smaller sites 
> that use a shared hosting solution. Those hosters usually don't give 
> their clients access to the underlying processes at all. All the clients 
> can do is upload static files and script files that get executed in a 
> restricted environment. It's really hard to integrate pure, persistent 
> connections into such a setup. With WS, a hoster could for example have 
> a demon listen to all incoming WS connections and call the client's 
> scripts whenever a data frame has been received. In short, it's much 
> easier for them to manage persistent connections if there is a 
> standardized structure. And if it's easier for them, hopefully the 
> support for this feature will grow.
> 
> As for the restriction of unicode data, of course we could just use an 
> octet counting mechanism like HTTP does instead of a fixed delimiter. 
> This would allow arbitrary data inside the WS frames. On the other hand, 
> this might make it easier to spoof existing protocols. Would the 
> benefits of this outweigh the risks?
> 
> Note that it was a conscious design decision to make WS incompatible 
> with existing protocols, because the risk for misuse is too great.
> 
> If you need your web app to inerac with a specific service, it should be 
> easy to write a generic proxy that does the handshake, strips out the 
> frame marks and transforms the data.
> 
> Also, it's not like the other technologies would vanish all of a sudden. 
> If you have sufficient control over your server, you can STILL use Java 
> or Flash sockets.

On Wed, 29 Oct 2008, Richard's Hotmail wrote:
> 
> Fine, you want "structure in content" then you stick it on *afterwards*. 
> Please don't impose your particular views on what everyone else's data 
> stream should look like! Some might like chunking, record-type 
> indicators, or data-length sentinels, or something completely different; 
> it's up to them!

On Mon, 22 Sep 2008, Shannon wrote:
>
> It's hard to determine the substance of your complaint. It appears you 
> don't really understand the Java, Flex or Silverlight implementations. 
> They are all quite restrictive, just in different ways:
> 
> * Java raises a security exception unless the user authorises the socket 
> using an ugly and confusing popup security dialog
>
> * Flex and Silverlight requires the remote server or device also run a 
> webserver (to serve crossdomain.xml). Flex supports connections ONLY to 
> port numbers higher than 1024. The crossdomain files for each platform 
> have different filenames and appear to already be partly incompatible 
> between the two companies, hardly a "standard".
> 
> Both Silverlight and Flash/Flex are fundamentally flawed since they run 
> on the assumption that a file hosted on port 80 is an authorative 
> security policy for a whole server. As someone who works in an ISP I 
> assure you this is an incorrect assumuption. Many ISPs run additional 
> services on their webserver, such as databases and email, to save rack 
> hosting costs or for simplicity or security reasons. I would not want 
> one of our virtual hosting customers authorising web visitors access to 
> those services. It is also fundamentally flawed to assume services on 
> ports greater than 1024 are automatically "safe".
> 
> These companies chose convienience over security, which quite frankly is 
> why their software is so frequently exploited. However that's between 
> them and their customers, this group deals with standards that must be 
> acceptable to the web community at large.
> 
> The current approach the HTML spec is taking is that that policy files 
> are essentially untrustworthy so the service itself must arbitrate 
> access with a handshake. Most of the details of this handshake are 
> hidden from the Javascript author so your concerns about complexity seem 
> unjustified. If you are worried about the complexity of implementing the 
> server end of the service I can't see why, it's about 3-6 lines of 
> output and some reasonably straight-forward text parsing. It could 
> easily be done with a wrapper for existing services.
> 
> Other than that it behaves as an asynchronous binary TCP socket. What 
> exactly are you concerned about?

I'm not really sure what the concerns are in the above-quoted e-mails.

The Web Socket protocol is designed to prevent spammers from writing Web 
pages that connect to SMTP servers, malware authors from writing Web pages 
that connect to IRC servers, spyware authors from writing Web pages that 
connect to SQL servers, and fraudsters from writing Web pages that connect 
to your bank's WebSocket API and transfering your money to their account.

It's also designed so that the API is trivial to use. It's also extensible 
so that in future we can add binary data and other mechanisms easily.

Sure, it means there are specific framings you have to use, and the 
handshake is a bit esoteric, but those seem like a cheap price to pay for 
the security.


On Mon, 29 Sep 2008, Anne van Kesteren wrote:
>
> What is the reason for doing literal comparison on the websocket-origin 
> and websocket-location HTTP headers? Access Control for Cross-Site 
> Requests is currently following this design for 
> access-control-allow-origin but sicking is complaining about so maybe it 
> should be URL-without-<path> comparison instead. (E.g., then 
> http://example.org and http://example.org:80 would be equivalent.)

I don't really see the advantage of making this less strict.


On Tue, 30 Sep 2008, Shannon wrote:
>
> I think the temptation to standardise features like access control 
> defeats the point of websockets. Since things like access control and 
> sessions can be readily implemented via CGI interfaces it seems implied 
> that the whole point of websockets is to provide "lightweight" services. 
> If the service actually needs something like this then the author can 
> perform the check post-handshake using any method they feel like. I 
> don't really feel strongly one way or the other about this particular 
> header but I'm concerned about the slippery-slope of complicating the 
> HTTP handshake to the point where you might as well be using CGI. Maybe 
> the standard should simply recommend sending the header but make no 
> requirement about how it is parsed. That way the service itself can 
> decide whether the check is even necessary and if so whether it should 
> be strict or loose or regex-based without the client automatically 
> hanging up the connection.

I don't really know what you're asking for here.


On Tue, 30 Sep 2008, Shannon wrote:
>
> It occurred to me the other day when musing on WebSockets that the 
> handshake is more complicated than required to achieve its purpose and 
> still allows potential exploits. I'm going to assume for now the purpose 
> of the handshake is to:
> 
> * Prevent unsafe communication with a non-websocket service.
>
> * Provide just enough HTTP compatibility to allow proxying and virtual 
> hosting.
> 
> I think the case has been successfully put that DDOS or command 
> injection are possible using IMG tags or existing javascript methods - 
> however the counter-argument has been made that the presence of legacy 
> issues is not an argument for creating new ones. So with that in mind we 
> should implement WebSockets as robustly as we can.
> 
> Since we don't at first know what the service is we really need to 
> assume that:
> 
> * Long strings or certain characters may crash the service.
>
> * The service may not be line orientated.
>
> * The service may use binary data for communications, rather than text.
>
> * Characters outside the ASCII printable range may have special meaning 
>   (ie, 'bell' or control characters).
>
> * No string is safe, since the service may use string commands and 
>   non-whitespace separators.
> 
> For the sake of argument we'll assume the existence of a service that 
> accepts commands as follows (we'll also assume the service ignores bad 
> commands and continues processing):
> 
> AUTHENTICATE(user,password);GRANT(user,ALL);DELETE(/some/record);LOGOUT;
> 
> To feed this command set to the service via WebSockets we use:
> 
> var ws = new
> WebSocket("http://server:1024/?;AUTHENTICATE(user,password);GRANT(user,ALL);DELETE(/some/record);LOGOUT;")
> 
> I have already verified that none of these characters require escaping 
> in URLs. The current proposal is fairly strict about allowed URIs but in 
> my opinion it is not strict enough. We really need to verify we are 
> talking to a WebSocket service before we start sending anything under 
> the control of a malicious author.

I agree that this is a plausible attack.

However, such a service is already vulnerable, one need just do:

   <img src="http://server:1024/?;AUTHENTICATE(user,password);GRANT(user,ALL);DELETE(/some/record);LOGOUT;">

So I'm not sure it's really a problem.


> Now given the huge variety of non-HTTP sub-systems we'll be talking to I 
> don't think a full URL or path is actually a useful part of the 
> handshake. What does path mean to a mail server for instance?

URLs are a useful part of the Web architecture; I don't see a reason to 
_not_ allow services to be based on paths. It seems useful to be able to 
have multiple services on one port.


> Here is my proposal:
> 
> C = client
> S = service
> 
> # First we talk to our proxy, if configured. We know we're talking to a proxy
> because it's set on the client.
> 
> C> CONNECT server.example.com:1024 HTTP/1.1
> C> Host: server.example.com:1024
> C> Proxy-Connection: Keep-Alive
> C> Upgrade: WebSocket/1.0
> 
> # Without a proxy we send
> 
> C> HEAD server.example.com:1024 HTTP/1.1
> C> Host: server.example.com:1024
> C> Connection: Keep-Alive
> C> Upgrade: WebSocket/1.0

The above appears to be a mis-use of HTTP (shouldn't the string after the 
HEAD be the path rather than the server hostname and port?). But that 
not-withstanding, let's consider the above proposal anyway:

Imagine a server whose commands are delimited by full stops, such that the 
following is a bad command to send the server:

   reboot.logout.

Now, say that evil.example.com sets up a CNAME DNS entry:

   x.reboot.logout.evil.example.com

...pointing to victim.example.org. Now you set up a WebSocket connection 
with x.reboot.logout.evil.example.com:1024, and the client sends:

   HEAD x.reboot.logout.evil.example.com:1024 HTTP/1.1

...and the server reboots.

How is this different to what we have now?


> The client and server can now exchange any authentication tokens, access 
> conditions, cookies, etc according to service requirements. eg:
> 
> ws.Send( 'referrer=' + window.location + '\r\n' );

The whole point is that the origin must be sent in a manner that the page 
cannot spoof, so that the server knows that if it is talking to a web 
browser, it can trust the Origin header. So having the page send the 
origin is not very useful. (If it's not talking to a web browser, it 
doesn't matter, since the user is the attacker.)


> The key advantages of this method are: [...]
> 
> * Security (No page author control over initial handshake beyond the server
> name or IP. Removes the risk of command injection via URI.)

Server name seems like just as much of a problem. Command injection via 
URI is already possible, so seems equally moot.


On Tue, 14 Oct 2008, Shannon wrote:
>
> In the process of testing my WebSocket proposal I discovered the CONNECT 
> method has a major restriction. Most proxies disable CONNECT to anything 
> but port 443.

Indeed. Tunnelling WebSocket over 443 is the expected implementation in 
most cases.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 17 November 2008 03:45:25 UTC