Re: HTTP: T-T-T-Talking about MIME Generation from John Ludeman on 1994-12-16 (ietf-http-wg@w3.org from October to December 1994)

From: John Ludeman <johnl@microsoft.com>
Date: Fri Dec 16 13:19:01 1994
To: http-wg-request%cuckoo.hpl.hp.com@hplb.hpl.hp.com, jim@spyglass.com
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9412162117.AA24428@netmail2.microsoft.com>

----------
| From: Larry Masinter  <masinter@parc.xerox.com>
| Date: Friday, December 16, 1994 10:54AM
|
| You don't need content-length to search for the boundary. You can go
| ahead and scan the binary data for <CR><LF>--boundary--<CR><LF>. Doing
| so either using a simple scan or using a more elaborate boyer-moore
| algorithm should be computationally not significantly more expensive
| than merely counting bytes.

I disagree.  In the case of HTTP, byte counting is *not* doing a 
strncpy of bytes.  When receiving a chunk of data, you generally give 
the buffer directly to the network layer which fills in the buffer.  To 
then have to scan this buffer *does* significantly add a performance 
hit to the server.

Somebody on the www-talk list posted a comparison of a server doing 
simple document scan before sending a file and just sending a file 
(analagous to the receive case we are talking about).  The server was 
two to three times slower when scanning the data.

A server should always provide a Content-length if possible (a wrong 
content length is a bug in the server).  The primary case the multipart 
becomes interesting is when dealing with a streaming gateway (aka, CGI) 
where buffering the data can be too expensive.  I think this is the 
only case doing a boundary scan should be used (I think I'd prefer a 
packet approach with byte count personally, but whatever).

Byte counts are good.  Real protocols use byte counts.  Let's make sure 
we move in that direction.

John

Received on Friday, 16 December 1994 13:19:01 UTC