Pausing, stopping and resuming downloads

Hello,

sometimes this mailing list is quite depressing to read, because there are
always so many people with problems, and so few people with solutions. So I
thought that since ATM I'm making some progress with my own use of libwww, 
I should share it to save others some work.

So: If an HTTP or FTP download is running, how do you pause/stop/resume it?

The bad news first: AFAICT libwww doesn't support FTP resumes (REST
command) at all. :-( Maybe I'll come up with a patch.


Pausing a download while leaving the connection open

  libwww doesn't actually make provisions for this, but it's relatively 
  easy to do (read: took me all of two days ;-/ ). You prevent the socket 
  of the download from being polled by select(), which means that the 
  connection stays open, but no data is transmitted over it. This is nice 
  because continuing is faster and because some servers (and some proxies 
  like WWWOFFLE) don't support HTTP ranges, so resuming the download in a 
  new connection isn't possible.
  
  For me, the following works with the glibwww event loop:

    /* The HTNet object whose socket we'll unregister from the event loop. This
       will prevent more data from being delivered to it, effectively 
       pausing the request. */
    HTNet* net = HTRequest_net(request);

    unsigned protocol = HTProtocol_id(HTNet_protocol(net));
    if (protocol == 21) {
      /* Protocol is FTP, which uses a control connection (which 
         corresponds to the main HTNet object) and a data connection. We 
         need the HTNet object for the latter. */
      ftp_ctrl* ctrl = static_cast<ftp_ctrl*>(HTNet_context(net));
      net = ctrl->dnet;
    }
  
    // Unregister socket
    HTEvent_setTimeout(HTNet_event(net), -1); // No timeout for the socket
    SOCKET socket = HTNet_socket(net);
    HTEvent_unregister(socket, HTEvent_READ);

  Unfortunately, this is a bit ugly: To make the FTP stuff work, you need 
  to copy the definitions of enum _HTFTPState and struct _ftp_ctrl from 
  HTFTP.c to your application's code.


Continuing a download whose connection is still open

  Analogous to the above, except we register the socket:

    // Register socket again
    /* For some weird reason the timeout gets reset to 0 somewhere, which
       causes *immediate* timeouts with glibwww - fix that. */
    HTEvent* event = HTNet_event(net);
    HTEvent_setTimeout(event, HTHost_eventTimeout());
    HTEvent_register(HTNet_socket(net), HTEvent_READ, event);

  Obviously, all requests in a HTTP pipeline get paused with this, not just
  the current one. In particular, the paused request may only be pending;
  in this case, the transmission of an earlier request in the same pipeline
  may actually get paused - not what we want. I partially solve this by
  pausing only the moment my request actually receives data via the write()
  method of the HTStream object I registered as the request's output
  stream.
  
  Immediately after continuing, the connection may be dropped, e.g. if the
  user paused, disconnected his modem, dialed in again, and then told the
  app to continue. Consequently, when an app uses this "soft pause", it 
  should also be able to do a full resume with a new connection.


Aborting a download

  There are a few minor pitfalls here. HTHost_killPipe closes all sockets 
  for that host instead of just the one of the connection, so use

    HTNet_killPipe(HTRequest_net(request))

  If your app crashes when you call this, it is because you're calling it
  from within the write() method of your stream - libwww doesn't like it if
  you delete the request object etc. from "right underneath its feet" while
  it's still processing the data that has just arrived on the request's
  socket.
  
  Instead, wait until the main event loop is reached again, and *then* kill 
  the pipe. With glibwww, this is easy to do because you can use 
  g(tk)_idle_add() to register a function which will be called back once 
  the main loop becomes idle.

  Obviously, this kills all requests in the pipeline, so you should
  re-schedule all the ones which you do want, or resume them if they were
  already downloaded in part. (AFAICT, the HTTP 1.1 standard doesn't allow
  you to selectively cancel just some pending or active requests, so the
  only thing libwww can do is to close the connection.)


How do I tell whether the download has succeeded/failed?

  HTAlert_setInteractive(YES);
  HTAlert_add(myAlertCallback, HT_A_PROGRESS);
  and pay attention to the HTAlertOpcode passed to myAlertCallback()
  

How do I distinguish between the connection being dropped due to an error
and the end of the transmission?

  With FTP, AFAIK you can't tell, unless you scan the directory listing 
  first to find the file size. (Haven't explored how difficult this would 
  be.)
  
  With HTTP downloads, the server will /usually/ have sent a Content-Length
  header, so you can check whether the promised number of bytes has already
  been received. Do *not* use HTAnchor_length(HTRequest_anchor(request)) to
  read the number of bytes, because for some reason this is not set up
  correctly for "206 Partial Content" responses. Instead, use
  
    HTResponse_length(HTRequest_response(request))
    
  which, for a 206, returns the number of bytes in the partial request, 
  i.e. total length - requested start offset.


Resuming a download starting with a certain byte offset.

  Requires HTTP ranges (i.e. HTTP 1.1) . Before starting the download with
  HTLoad(request, NO), use
  
    HTRequest_addRange(request, "bytes", "333-999")

  to fetch bytes 333-999 (both inclusive, starting from 0), you can also
  use just "333-" to fetch from byte 333 onwards.
  
  Beware that a non-HTTP 1.1 aware server may just ignore the range request
  and send the data starting with offset 0 - to detect this case, you need
  to check whether the server sent a Content-Range header like
  "Content-Range: bytes 303104-17242732/17242733". The header is present if
  
    HTResponse_range(HTRequest_response(request))
    
  returns non-null. This only works for HTTP, i.e. if
  HTProtocol_id(HTNet_protocol(HTRequest_net(request))) == 80.


Further problem with FTP: "REIN"

  libwww doesn't behave correctly with my test FTP server (running
  OpenBSD's ftpd): When libwww wants to reuse an existing control
  connection, it first issues REIN, which the server doesn't understand
  ("502 REIN command not implemented."). Next, it thinks it has to send
  "USER anonymous" again, which doesn't work either ("530 Can't change user
  from guest login.") At this point libwww gives up, when it could actually
  just proceed with the RETR. Grr... patch forthcoming.
  
  
All of the above is based on work on my program "jigdo" - see the
"download.cc" file in its sources. (But wait until the next release, the
current 0.6.9 code doesn't yet have all the pause/resume code.)

Cheers,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer     |  CS student at the Technische  |  GnuPG key:
  | \/¯|  http://atterer.net  |  Universität München, Germany  |  0x888354F7
  ¯ '` ¯

Received on Wednesday, 12 February 2003 18:25:28 UTC