Re: Server-side roles in the HTTP

I was just paging through previous e-mail to prepare draft-01, and came
across this, which I don't think I responded to.

[me]
> >If you have a PDF file that happens to be generated by a CGI script, it's
a huge pain to get it to support ranges, validation and
> >other facilities in the script itself. I've done it, and it was not
pleasant.

[JP]
> What you suggest when requesting the server implementor to do this job is
that  the server has to regenerate the whole dynamic page just to get a few
bytes. This
> may involve rerunning the application. Even if your CGI sends exactly the
same-content with the same last-modified date, so that the client still
thinks it's the
> same content between HTTP requests, the PDF plug-in might make 10
different HTTP requests in a row with various ranges. The server executes
the script 10 times,
> gets the full PDF content from the script, and then only sends the
requested bytes. Technically, it works, but it's extremely inefficient.
> On the other hand, if you let the CGI itself handle the range request,
then it's not so bad : the CGI will try to generate those requested bytes
itself and won't
> waste memory or CPU trying to generate and send the entire thing.

I agree to some extent. I need to make it more clear that the draft is about
defaults; in this case, if you have a more efficient way of handling range
requests in the CGI, have it generate a 'Accept-Ranges: bytes' header, and
the server should know that the application is capable of dealing with
partial content. Best of both worlds.

Even if this isn't taken advantage of, IMHO it's much easier to scale a
server (hardware) than the complete network between any possible user and
the server. Regenerating the entire object to send a few bytes is
inefficient on the server, but it's better than leaving it in the
publisher's hands; as in most cases it won't be done at all.

I'd very much like a survey to be done of Webmasters, CGI scripters, etc.,
to ask them where they think the responsibilty for handling these features
currently lies.


> Ranges work best with direct access content (eg: static files) or with
cacheable content ; with dynamic content, the server typically only has
sequential access to
> the content and does not cache it.

I know what you're getting at, but I really want to redefine what people
think of as dynamic content, as well as cacheability.

Dynamic content to me is defined by dependence on either the identity
(however derived) of the current user, or some other external source of
content entropy that causes two hits to the same request at the same time to
generate different content. That's it; it has nothing to do with the
presence of the string cgi-bin, a query or anything else.

In my mind cachability is two very separate properties; assigned TTL
(through Expires, Cache-Control or assumed through LM) and ability to
validate through conditional request. In the future, it may expand to
include such methods as ability to delta encode.


> That's not to say the HTTP server shouldn't do anything. I think it makes
sense for it to transparently do chunking on CGI and plug-ins output, and
the way we are
> doing it now works very nicely (though I wish we could use our own browser
to test this feature :)).

*grin* know that feeling...

--
Mark Nottingham

Received on Saturday, 18 September 1999 11:31:27 UTC