- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Fri, 16 Dec 94 12:13:50 PST
- To: Albert Lunde <Albert-Lunde@nwu.edu>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Larry Masinter <masinter@parc.xerox.com> writes:
>You don't need content-length to search for the boundary. You can go
>ahead and scan the binary data for <CR><LF>--boundary--<CR><LF>. Doing
>so either using a simple scan or using a more elaborate boyer-moore
>algorithm should be computationally not significantly more expensive
>than merely counting bytes.
Yes, you can do this, but to make it work for arbitrary binary
files, you have to choose a boundary that is not in the file,
either using a random string and hoping, or scanning the whole file
(which generally costs more than just determining the size.)
I used to think that boundary-scanning was expensive, but given my
experience that CPUs always get faster (I started my programming life
using a time-shared PDP-11/40, after all), I think I can see Larry's
point.
A little experiment should help shed some light on this. On a reasonably
fast machine (a DEC3000/600), I did (after first arranging for /vmunix
to be in the filesystem cache):
% time wc /vmunix
15102 97927 6351808 /vmunix
0.93u 0.14s 0:01 94% 0+1k 0+0io 15pf+0w
a% time ngrep masinter@parc.xerox.com /vmunix
0.27u 0.19s 0:00 74% 0+1k 4+3io 47pf+0w
%
ngrep uses a modified Boyer-Moore algorithm. Larry is right;
Boyer-Moore with a properly chosen search string is actually three
times faster than simply counting bytes (although the byte count
might actually be available as a side-effect of some other operation,
6351808 bytes in 0.27 seconds is 188 Mbits/sec., so it's not going
to be the rate-limiting step.)
I did a similar experiment on the slowest machine I could find,
an old DECstation-3100 (about 11 SPECmark, I think). This time,
/vmunix didn't fit into the buffer cache. Anyway, I got:
% time wc /vmunix
8821 50904 2120392 /vmunix
3.8u 1.4s 0:09 56% 40+71k 263+2io 0pf+0w
% time ngrep masinter@parc.xerox.com /vmunix
0.3u 1.0s 0:02 67% 78+112k 16+0io 0pf+0w
%
That's "only" 56 Mbits/sec for Boyer-Moore. But there is no way
that this little CPU is going to drive any network interface
at that speed, anyway.
-Jeff
Received on Friday, 16 December 1994 12:31:06 UTC