HTTP 1.1, PROXY, close connection

 Dear libwww experts,
 It is my suspicion, that the close-directive from a server is not
 handled by libwww 5.1b during asynchronous requests (right? wrong?)
 or that functions are missing (see below).

 Here what happens:

      Linux, Xt driven event loop, Proxy configured

   initial request:
       http://www.linuxhq.com/ sent to proxy

        HTTP/1.1 200 OK
        Date: Fri, 02 May 1997 23:22:37 GMT
        Server: Apache/1.2b10
        Last-Modified: Fri, 02 May 1997 03:15:36 GMT
        ETag: "33853-18d0-33695c58"
        Content-Length: 6352
        Accept-Ranges: bytes
        Connection: close
        Content-Type: text/html
 In short, the server on www.linuxhq.com is a HTTP/1.1 server, the
 proxy cannot handle persistent connections and inserts "Connection:
 close" into the header. The HTTP 1.1 spec states that this is the
 correct way to HTTP 1.0 style single requests.

 The libwww code in 5.1b handles the "Connection: close" by setting
 the setting the HTHost_closeNotification flag in HTHost to true and
 leaving everything else (persistent, protocol version etc) the same
 for the time being.

 Our code (the cineast browser, see 


 from our poster presentation at WWW6) issues libwww requests
 asynchronous. This means an image request is issued before the
 loading of the HTML document is completed. In the figure below the x
 axis is time, the horizontal position of the s in "starting"
 indicates the time when the request is started, the dots indicate the
 loading time

   starting http://www.linuxhq.com/  ................................
        starting img1 .....................................
             starting img2....................................
 With the original 5.1b code linuxhq is loaded, the img1 and img2
 requests are sent on the same socket, the proxy closes the connection
 after linuxhq is finished. Depending on the timing the code might

   - catch a SIGPIPE signal when a further image request
     is submitted to the closed pipe, or
   - sit in a (non-blocking) read loop hoping in vain that the proxy
     will send the images
 Here are my questions:
  1) Why does libwww sonly set a flag when the "close connection" tag 
     in the header of the first request (linuxhq) is handled. Why it
     does it do not something more dramatically (e.g. turning off
     persistent, setting HT_TP_SINGLE). If I would implement this,
     what problems would I face?

  2) Another place, where the close notification could be handled
     is in HTHost_new(), when the first image request is handled.

     PUBLIC HTHost * HTHost_new (char * host, u_short u_port)
       ... lookup host structure ...
       if (pres) {  /* which means there is a host */
        if (pres->channel) {  /* the host has a channel */
            if (pres->expires && pres->expires < time(NULL)) { 
                    /* Cached channel is cold */
                if (CORE_TRACE)
                    HTTrace("Host info... Persistent channel %p gotten cold\n",
                HTChannel_delete(pres->channel, HT_OK);
                pres->channel = NULL;
            } else {   /* the channel can be used */
                if (CORE_TRACE)
                    HTTrace("Host info... REUSING CHANNEL %p\n",pres->channel);

     When this code is executed the close_notification flag from
     the top request has already been noted. I tried to handle
     the close_notification situation like the "cold channel",
     but it ends up in a SEGV in HTTee_write

        HTTee_write <- HTTPStatus_put_block <- HTReader_read  <-
        HTHost_read <- HTTPEvent(HTEvent_READ)

     It looks like an HTStream is invalid. Maybe there is another
     pointer pointing to the channel which is deleted above and
     cleared in the host structure.... Maybe something is
     missing here in the situation of a close_notification,
     maybe there is a separate problem.

  3) I found a third approach to handle this problem: The HTTP 1.1 
     spec says that the client should handle a close from the server
     at any time. In HTTP.c it reads promising in HTTP_CONNECTED:

       case HTTP_CONNECTED:
              if (type == HTEvent_WRITE) {
                **  Should we use the input stream directly or call the post
                **  callback function to send data down to the network?
                    HTStream * input = HTRequest_inputStream(request);
                    HTPostCallback * pcbf = HTRequest_postCallback(request);
                    if (pcbf) {

     but this needs a request post callback which has to return a 
     HT_CLOSED state in order to trigger HTTP_RECOVER_PIPE 

                           status = (*pcbf)(request, input);
                            if (status == HT_PAUSE || status == HT_LOADED) {
                            } else if (status==HT_CLOSED) {
                                http->state = HTTP_RECOVER_PIPE;
                            else if (status == HT_ERROR) {

     which does the hard work (flushing the request, recover the
     pipe, set being state and launch pending requests). Recover pipe
     will "Move all entries in the pipeline and move the rest to the
     pending queue. They will get launched at a later point in time.".

     I find it strange that this functionality needs the post
     callback, which is nowhere registered (or did i miss it?).  It is
     simple to register a post callback returning HT_CLOSED in
     HTTP_BEGIN after HTHost_connect() in a close_notification
     situation. I did so and got the same SEGV in HTTee_write
     as in (2).

 So, what is the way to go 1, 2, or 3?

 Is the SEGV in HTTee_write a separate problem?

 Are there updated state machines or code specifications available for
 mere humans outside the W3C?  For example the state diagram for HTTP
 in Library/User/Architecture/HTTP.gif is from Jun 30 1995. Henrik,
 how did you keep track of the states of the various objects.

 In my opinion there are many improvements between 5.0a and 5.1b aside
 the speed, but the new objects/features/changes are not sufficiently
 covered in the documentation.

 Any hints are welcome. 

 Best regards

-gustaf neumann

PS: yesterday, i sent a mail with a fix to libwww@w3.org (as indicated in 
   README.html) under the impression that this address belongs to the 
   public mailing list. My current understanding about the mailing lists
   is the following:
      libwww@w3.org:         mail to the libwww developers at w3c
      www-lib@w3.org:        public mailing list for discussion
      www-lib-bugs@w3.org:   public mailing list for bug reports
   Is this correct?
   Regarding the low traffic and judging from people sending to both lists,
   would it not be a good idea to merge the last two?

   I assume that libwww@w3.org is subscribed on the public lists, and we 
   do not have to send submissions that should reach the w3c people
   to libwww.org as well. Correct?
Wirtschaftsinformatik und Softwaretechnik        
Universitaet GH Essen, FB5
Altendorfer Strasse 97-101, Eingang B, D-45143 Essen
Tel.: +49 (0201) 81003-74, Fax:  +49 (0201) 81003-73