- From: Richard Atterer <richard@list03.atterer.net>
- Date: Wed, 12 Feb 2003 18:34:29 +0100
- To: www-lib@w3.org
Hello,
sometimes this mailing list is quite depressing to read, because there are
always so many people with problems, and so few people with solutions. So I
thought that since ATM I'm making some progress with my own use of libwww,
I should share it to save others some work.
So: If an HTTP or FTP download is running, how do you pause/stop/resume it?
The bad news first: AFAICT libwww doesn't support FTP resumes (REST
command) at all. :-( Maybe I'll come up with a patch.
Pausing a download while leaving the connection open
libwww doesn't actually make provisions for this, but it's relatively
easy to do (read: took me all of two days ;-/ ). You prevent the socket
of the download from being polled by select(), which means that the
connection stays open, but no data is transmitted over it. This is nice
because continuing is faster and because some servers (and some proxies
like WWWOFFLE) don't support HTTP ranges, so resuming the download in a
new connection isn't possible.
For me, the following works with the glibwww event loop:
/* The HTNet object whose socket we'll unregister from the event loop. This
will prevent more data from being delivered to it, effectively
pausing the request. */
HTNet* net = HTRequest_net(request);
unsigned protocol = HTProtocol_id(HTNet_protocol(net));
if (protocol == 21) {
/* Protocol is FTP, which uses a control connection (which
corresponds to the main HTNet object) and a data connection. We
need the HTNet object for the latter. */
ftp_ctrl* ctrl = static_cast<ftp_ctrl*>(HTNet_context(net));
net = ctrl->dnet;
}
// Unregister socket
HTEvent_setTimeout(HTNet_event(net), -1); // No timeout for the socket
SOCKET socket = HTNet_socket(net);
HTEvent_unregister(socket, HTEvent_READ);
Unfortunately, this is a bit ugly: To make the FTP stuff work, you need
to copy the definitions of enum _HTFTPState and struct _ftp_ctrl from
HTFTP.c to your application's code.
Continuing a download whose connection is still open
Analogous to the above, except we register the socket:
// Register socket again
/* For some weird reason the timeout gets reset to 0 somewhere, which
causes *immediate* timeouts with glibwww - fix that. */
HTEvent* event = HTNet_event(net);
HTEvent_setTimeout(event, HTHost_eventTimeout());
HTEvent_register(HTNet_socket(net), HTEvent_READ, event);
Obviously, all requests in a HTTP pipeline get paused with this, not just
the current one. In particular, the paused request may only be pending;
in this case, the transmission of an earlier request in the same pipeline
may actually get paused - not what we want. I partially solve this by
pausing only the moment my request actually receives data via the write()
method of the HTStream object I registered as the request's output
stream.
Immediately after continuing, the connection may be dropped, e.g. if the
user paused, disconnected his modem, dialed in again, and then told the
app to continue. Consequently, when an app uses this "soft pause", it
should also be able to do a full resume with a new connection.
Aborting a download
There are a few minor pitfalls here. HTHost_killPipe closes all sockets
for that host instead of just the one of the connection, so use
HTNet_killPipe(HTRequest_net(request))
If your app crashes when you call this, it is because you're calling it
from within the write() method of your stream - libwww doesn't like it if
you delete the request object etc. from "right underneath its feet" while
it's still processing the data that has just arrived on the request's
socket.
Instead, wait until the main event loop is reached again, and *then* kill
the pipe. With glibwww, this is easy to do because you can use
g(tk)_idle_add() to register a function which will be called back once
the main loop becomes idle.
Obviously, this kills all requests in the pipeline, so you should
re-schedule all the ones which you do want, or resume them if they were
already downloaded in part. (AFAICT, the HTTP 1.1 standard doesn't allow
you to selectively cancel just some pending or active requests, so the
only thing libwww can do is to close the connection.)
How do I tell whether the download has succeeded/failed?
HTAlert_setInteractive(YES);
HTAlert_add(myAlertCallback, HT_A_PROGRESS);
and pay attention to the HTAlertOpcode passed to myAlertCallback()
How do I distinguish between the connection being dropped due to an error
and the end of the transmission?
With FTP, AFAIK you can't tell, unless you scan the directory listing
first to find the file size. (Haven't explored how difficult this would
be.)
With HTTP downloads, the server will /usually/ have sent a Content-Length
header, so you can check whether the promised number of bytes has already
been received. Do *not* use HTAnchor_length(HTRequest_anchor(request)) to
read the number of bytes, because for some reason this is not set up
correctly for "206 Partial Content" responses. Instead, use
HTResponse_length(HTRequest_response(request))
which, for a 206, returns the number of bytes in the partial request,
i.e. total length - requested start offset.
Resuming a download starting with a certain byte offset.
Requires HTTP ranges (i.e. HTTP 1.1) . Before starting the download with
HTLoad(request, NO), use
HTRequest_addRange(request, "bytes", "333-999")
to fetch bytes 333-999 (both inclusive, starting from 0), you can also
use just "333-" to fetch from byte 333 onwards.
Beware that a non-HTTP 1.1 aware server may just ignore the range request
and send the data starting with offset 0 - to detect this case, you need
to check whether the server sent a Content-Range header like
"Content-Range: bytes 303104-17242732/17242733". The header is present if
HTResponse_range(HTRequest_response(request))
returns non-null. This only works for HTTP, i.e. if
HTProtocol_id(HTNet_protocol(HTRequest_net(request))) == 80.
Further problem with FTP: "REIN"
libwww doesn't behave correctly with my test FTP server (running
OpenBSD's ftpd): When libwww wants to reuse an existing control
connection, it first issues REIN, which the server doesn't understand
("502 REIN command not implemented."). Next, it thinks it has to send
"USER anonymous" again, which doesn't work either ("530 Can't change user
from guest login.") At this point libwww gives up, when it could actually
just proceed with the RETR. Grr... patch forthcoming.
All of the above is based on work on my program "jigdo" - see the
"download.cc" file in its sources. (But wait until the next release, the
current 0.6.9 code doesn't yet have all the pause/resume code.)
Cheers,
Richard
--
__ _
|_) /| Richard Atterer | CS student at the Technische | GnuPG key:
| \/¯| http://atterer.net | Universität München, Germany | 0x888354F7
¯ '` ¯
Received on Wednesday, 12 February 2003 18:25:28 UTC