Re: Best place to hack into libWWW ? from Henrik Frystyk Nielsen on 1999-12-21 (www-lib@w3.org from October to December 1999)

From: Henrik Frystyk Nielsen <frystyk@microsoft.com>
Date: Tue, 21 Dec 1999 08:13:23 -0800
To: "Amir Khassaia" <AKhassaia@vet.com.au>
Cc: <www-lib@w3.org>
Message-ID: <000f01bf4bce$51257a20$c2bb1eac@redmond.corp.microsoft.com>

> Hi I am trying to customize the web-robot app that comes with LibWWW
to
> add some features to it.
> One of them is downloading files only up to certain size.
> I have tried putting some code together for detecting and hadling it,
> but the functions that return the size of the file like
HTAnchor_length(
> ) , HTResponse_length( ) or HTAnchor_header( ) only work correctly in
> some places (like terminate_handler).
> What is the best place to start putting that sort of code in the
LibWWW
> ?

The reason why it is not known until the after filters is that we first
need to get the response headers from the server and even then it is not
guaranteed that we know the size. The response may be using chunked
transfer encoding, or it may not have a content-length header and
instead close the connection in order to delimit the message.

The client can of course just abort the connection but that it often a
very heavy handed mechanism - especially if using HTTP pipelining (which
libwww does under the covers).

The only way to get around this is to first issue a HEAD request which
if it contains a Content-Length header field gives you the size of the
response body. That it, you first do a HEAD and then a GET if you want
to get the whole thing.

The way you do this is to register an after filter that looks at the
result of the HEAD and if the response is less than a certain size then
it changes the method to GET and reissues the request.

Workflow wise, this is similar to what you have in the HTRedirectFilter
AFTER filter which you can find in

    http://www.w3.org/Library/src/HTFilter.c

and which is registered in HTAfterInit in

    http://www.w3.org/Library/src/HTInit.c

Instead of registering it for a redirection status code, you register it
for a HT_OK (200) code, which is the case you are interested in.

Henrik

Received on Tuesday, 21 December 1999 11:13:59 UTC