- From: Mark D. Anderson <mda@discerning.com>
- Date: Thu, 27 May 1999 18:30:36 -0700
- To: <www-lib@w3.org>
I've been digging into how various user agents behave, in terms of their use of threads, their use of connections, and their use of socket io modes (nonblocking and async). This has been somewhat cursory, and is quite likely to have substantial errors; I welcome clarifications/corrections, as well as pointers to more information. Theory ------ The choice of threading in the UA is obviously invisible to the server; the server only cares about tcp/ip connections. The more sophisticated http client libraries (including w3-libwww) use non-blocking sockets. Any thread can (in principle) use any of the outstanding tcp/ip connections. However, in principle more than one thread should *not* actually be needed for tcp/ip ops, since ops are all non-blocking. All that should be necessary are a pool of worker threads to handle the callbacks when an event occurs (such as a response coming back in, or a connection being lost). There is some terse (and possibly accurate - it is 2.5 years old) documentation concerning threads and events in w3-libwww at: http://www.w3.org/Library/User/Architecture/Events.html Essentially, w3-libwww uses continuations (closures) for all potentially blocking handling operations, as a particular request goes through a state machine (in its simplest form: connect, send, receive, disconnect). It is up to the caller to provide the threading/blocking model; the library itself never creates or changes OS threads. Typically, a UA will have a maximum number of outstanding tcp/ip connections that it will maintain (this number is independent of the number of event handling worker threads, or of network library threads). If a UA is speaking to just a single server port that implements 1.1 pipelining, then it can just keep using that connection for all http requests, and needn't use any others (for example, to get img or css objects). If there are links (or other UA windows) to other servers, then other connections will be needed. Also, if the server does not implement pipelining (or worse yet, does not implement keep-alive), then additional connections will be needed even for a single server. In terms of high-level algorithm, any of the worker threads (or a GUI thread) can create a new "Request" object. - if there is an existing connection to the server that is idle or implements pipelining, then the request can be sent. - else if another socket can be created (up to max), then a new connection is established, and a continuation is created for sending the request. - else a continuation is registered for the next available socket to be available. There are various handlers registered for events such as connection completion, responses coming back, and connection closure. Any time one of those events happens, there might be some rearrangement of the existing continuations (for example, someone waiting to get another connection might bail and use an existing one if it came available first). How async io affects the above (not just non-blocking sockets), I'm not sure. I believe it would just make it more efficient, but not change the design pattern. Reality: IE ----------- I used a threaded debug utility to trace IE. IE uses WinInet (HttpOpenRequest, HttpSendRequest, InternetQueryDataAvailable, InternetReadFile) at the highest level, from its "worker threads". That sequence is done for every url that is required (embedded img or css, etc.). The low-level networking (send/recv) operations are actually carried out in separate threads (which also do cache manipulation). Uses more than 1 such networking thread -- I don't understand why since it is all non-blocking. Only one of that small pool of networking threads calls select(). It appears to use async IO. I saw no evidence of overlapped IO (apparently that is just an optimization used in the server). Does anyone know the heritage of microsoft's wininet code, or why they seem to use multiple threads? I don't know what the max number of sockets IE will hold open is. Reality: Netscape ----------------- I used a threaded debug utility to trace Netscape on Windows, as well as doing a cursory examination of the Mozilla M5 source code (the file /mozilla/network/protocol/http/mkhttp.c). Netscape seems to use only one thread for networking ops, which are all non-blocking (as I claim above should be possible). I don't know what the limit is in netscape on outstanding connections. It appears that Netscape wrote their own networking layer, with the same sort of continuation/state machine approach described above. Reality: JDK 1.2 ("2.0") ------------------------ I downloaded the "community license" source code to Sun JDK 1.2 ("2.0"). Somewhat surprisingly, the files in jdk1.2-src/src/share/classes/sun/net/www/protocol/http/ indicate that the JDK only implements http 1.0, not 1.1. Furthermore, I see only evidence for blocking threads, and blocking sockets. I assume this means that a new OS thread has to be created for every concurrent http request, and every such request will also require a new tcp/ip connection (unless one is idle and 1.0 keep-alive applies). Not terribly advanced, it seems to me. The w3 Jigsaw project provides an http 1.1 client and server in java. It appears to have written an entirely separate http stack from the one in the jdk. Reality: Perl LWP ----------------- I examined LWP/Parallel/Protocol/http.pm. This implements only http 1.0. Perl is not multi-threaded (or rather, I don't know of anyone who has tried using the MT perl build with LWP::Parallel). Furthermore, the sockets are still blocking (with a timeout). The caller just piles up a list of requests, and those each get a separate socket which is put in a select fd_set. When a handler for a socket event is called, the whole process blocks while that handler does what it is going to do. The same is true for doing any read or connect operation. It relies on unix alarm's for its success (i.e. it implements timeout on all network requests with an OS alarm). Reality: w3-libwww ------------------ As described above in the "Theory" section. Not sure about its use of async sockets or of overlapped IO. The W3 Amaya project uses w3-libwww; I don't know what design pattern it uses. Reality: others --------------- What others are there that might be of interest? -mda
Received on Thursday, 27 May 1999 21:33:32 UTC