Lost bytes in an http server reply - very strange

** The Problem
Sometimes large, cgi-generated html files lose blocks of data

** Background
o http1.3 server running on SunOS
o Perl cgi-script producting output
o Output in the 10-20K range

** Other details
o The browser type doesn't seem to matter, it's happened in Netscape and
Mosaic on Unix, Netscape on the Mac, my Perl htget script, and on a Mosaic
derivitive on Windows.

o It doesn't happen all the time.  I can replicate it about 2/3rds of the time.

o The loss, the times I've measured it, occurs about 11K into the file. The
beginning is there, the end is there. The amount lost is always an even multiple
lf 1K bytes.

o The log file on the server shows that the correct number of bytes were sent.

o The server loops doing freads and fwrites on a pipe. It reads in 8K
blocks. It loops the fwrite until all the bytes are written.  Only way out
of the loop is if fread returns 0 or the alarm() goes off. Both would
result in cancelling the entire stream, not just blocks in the middle.

o I saw this happen similarly a while ago using 1.3 on a Sony NEWS machine.
At the time I thought it was a Netscape bug, and I was able to alleviate,
but not entirely prevent, it by buffering output in the cgi script and then
spurting out things in 16K or so chunks.

o I strongly suspect a timing problem somewhere.  Buffering things seems to
help, and the data loss currently always seems to occur at about the place
where the script does additional database accesses.

If you want to try and see the problem, try accessing

http://www.swcouncil.org/search.html?Sales=100M%20to%201%20Billion&_Exact=LO
TUS%20DEVELOPMENT%20CORP02142&NAME=EXACT#EXACT

You'll know when it fails because the output will often be screwed up and
it probably won't take you to the right entry (the <a name=EXACT> often
gets lost). That claims to be sending (and sometimes does) 18962 bytes
(assuming your using Netscape 1.1, other browsers get things in a different
formats.)

Does anyone have any ideas?
At first I thought it was my scripts, but they generate the right data when
run by hand, and the server reports the right number of bytes being read.
The server counts the bytes as it reads them, not as it writes them, but I
can't see anyway for the code to fail.  (For completeness sake I've included
it below - http1.3 NCSA copyrights apply.)  The problem is client independent,
I've seen it before on my local server, so it's not our PPP software.  I've also
seen it on two different server platforms.  About the only variable that is
constent is the version of the server.


void send_fd(FILE *f, FILE *fd, void (*onexit)())
{
    char buf[IOBUFSIZE];
    register int n,o,w;

    exit_callback = onexit;
    signal(SIGALRM,send_fd_timed_out);
    signal(SIGPIPE,send_fd_timed_out);

    while (1) {
        alarm(timeout);
        if((n=fread(buf,sizeof(char),IOBUFSIZE,f)) < 1) {
            break;
        }
        o=0;
        if(bytes_sent != -1)
            bytes_sent += n;
        while(n) {
            w=fwrite(&buf[o],sizeof(char),n,fd);
            n-=w;
            o+=w;
        }
    }
    fflush(fd);
}

Kee Hinckley      Utopia Inc. - Cyberspace Architects    617/721-6100
nazgul@utopia.com                               http://www.utopia.com/

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.

Received on Sunday, 18 June 1995 20:57:37 UTC