Re: [computing the] size of a downloaded resource => Lessons learned

 Lessons learned:

We've found that there is a lot to determining the size of a downloaded resource
when Jigsaw acts as a proxy. First, if the resource is cached in the proxy or
on the client, then no resource is downloaded and there is no input stream (assuming
that the resource has not been modified and therefore the HTTP response is
"304 Not Modified").

Second, if there is a valid reply, the input stream associated with the reply may or
may not support marking. If it does, then simply mark it, read it (and count the
length), and the reset it.  If it doesn't supporting marking, one can read it only
once. Therefore, (as far as I know) you must read it into a (string)buffer while
counting the bytes, then do a reply.setContent(new String(buffer)), then close
the stream. (Is there a better way?)

This should apply as well to any outgoing filter that wants to modify a reply
(i.e., it will need to be extracted from the input stream before it can be modified).

Paul.

p.s. a code snip is available if anyone is interested.



Yves Lafon wrote:

> On Fri, 9 Jan 1998, Gil Hansen wrote:
>
> > At 06:53 PM 1/8/98 +0100, Yves Lafon wrote:
> > >On Sun, 21 Dec 1997, Gil Hansen wrote:
> > >> How does one obtain the size of a downloaded resource? The following always
> > >> yields zero: int size = reply.getContentLength();
> > >
> > >The code is well reply.getContentLength(), but it has to called in the
> > >outgoingFilter(Request request, Reply reply);
> > >Which is after the resource has been downloaded (I assume that you are
> > >talking about a client-side filter...)
> >
> > I have a client-side filter that invokes a service's methods beforeGET and
> > afterGET. The sizes of the w3c.www.protocol.http.Reply does not agree with
> > the sizes of the w3c.jigsaw.http.Reply (obtained by modifying
> > w3c.jigsaw.Client). This occurs when downloading www.consumerworld.org
> > (which is interesting because it downloads many offsite URLs). Actually,
> > the sizes are mostly -1. [It should be noted that the sizes match when
> > downloading, for example,  www.dec.com and www.objs.com] The following
> > trace excerpt illustrates this:
> >
> > ==>Capture a new host: www.consumerworld.org
> > request URL: http://www.consumerworld.org/
> >   start:   Thu Jan 08 21:45:52 CST 1998
> > ...
> > ***Service.beforeEvent() called...
> > ***Service.beforeGET() called: http://www.consumerworld.org/gifs/conglox2.gif
> > Processing Server-Level Outgoing Filter...
> > ***Service.afterEvent() called...
> > ***Service.afterGET() called: http://www.consumerworld.org/gifs/conwbak2.gif
> > @@@end before start
> >   end:   Thu Jan 08 21:37:58 CST 1998
> >   request size: -1
> >   reply size: 1564
> > Processing Server-Level Outgoing Filter...
> > ***Service.afterEvent() called...
> > ***Service.afterGET() called: http://www.consumerworld.org/gifs/marqpric.gif
> > @@@end before start
> >   end:   Thu Jan 08 21:37:58 CST 1998
> >   request size: -1
> >   reply size: 2993
> > Processing Server-Level Outgoing Filter...
> > ***Service.afterEvent() called...
> > ***Service.afterGET() called: http://www.consumerworld.org/gifs/conglox2.gif
> > @@@end before start
> >   end:   Thu Jan 08 21:37:58 CST 1998
> >   request size: -1
> >   reply size: -1
> > ======= w3c.jigsaw.http.Client: client-1(socket-clients:20),
> > contentLength=13166, size sent=13166: request
> > http://www.consumerworld.org/gifs/conglox2.gif, duration = 781
>
> This one is really strange, I did the same test, accessing the same page
> through a proxy and displaying the size of the reply in the
> outgoingFilter (your afterGET seems to be the outgoingFilter).
> and I have the right value (Reply:13166). Of course I have -1 for their
> home page as they don't send the Content-Length header:
>
> 11:32 tarantula ~ 78 >telnet www.consumerworld.org 80
> Trying 199.45.33.106...
> Connected to consumer.baweb.com.
> Escape character is '^]'.
> HEAD / HTTP/1.1
> Host: www.consumerworld.org
> Connection: close
>
> HTTP/1.0 200 OK
> Server: Netscape-Enterprise/2.0a
> Date: Mon, 12 Jan 1998 10:24:39 GMT
> Content-type: text/html
>
> Connection closed by foreign host.
>
> The only size you must use is the one taken from the Reply of the
> outgoingFilter, as it doesn't have more headers than the ones the
> remote server has sent. You may want to calculate the real number of
> bytes transferred, but then you need to add all the headers of the
> reply/request. (If your goal is to calculate the real bandwith with
> remote servers).
>
> > ...
> > ======= w3c.jigsaw.http.Client: client-3(socket-clients:18),
> > contentLength=2993, size sent=2993: request
> > http://www.consumerworld.org/gifs/marqpric.gif, duration = 621
> >
> > Note that there is a second problem with the duration of the downloads. The
> > end times of both replies are after the start of the download of
> > www.consumerworld.org. The start and end times were obtrained from
> > request.getDate() and reply.getDate(), respectively, for the request in the
> > beforeGET() method and the reply in the afterGET() method.
>
> The date on the remote host seems to be incorrect, my telnet request was
> done a 10:32 GMT and the server answered 10:24:39 GMT
> So the calculation of the time delta during the transaction is wrong. The
> best way is to use a start time when you go through the ingoingFilter, an
> end time in the outgoing and do the difference with the same time reference.
> Otherwise, as the times are taken from the headers, it can be wrong.
>
>       /\          - Yves Lafon - World Wide Web Consortium -
>   /\ /  \                Architecture Domain - Jigsaw
>  /  \    \/\
> /    \   /  \   http://www.w3.org/People/Lafon - ylafon@w3.org

   --

********************************************************************
Paul Pazandak                                      pazandak@objs.com
Object Services and Consulting, Inc.             http://www.objs.com
Minneapolis, Minnesota 55420-5409                       612-881-6498
********************************************************************

Received on Wednesday, 25 February 1998 13:07:20 UTC