Re: prefetching attribute (WAS Re: two ideas...) from Ted Hardie on 1995-12-21 (www-talk@w3.org from November to December 1995)

From: Ted Hardie <hardie@merlot.arc.nasa.gov>
Date: Wed, 20 Dec 1995 17:07:51 -0800 (PST)
To: brian@organic.com (Brian Behlendorf)
Cc: hardie@internic.nasa.gov, hammond@csc.albany.edu, www-talk@w3.org
Message-Id: <199512210107.RAA28534@merlot.arc.nasa.gov>
First off, let me agree whole-heartedly that this is academic
until someone implements a test.  There is some benefit, however,
to discussing what we might want to test.

The primary reason Brian seems to want to send the "Top" of the
pre-fetched document (or other object) rather than the whole thing is
to generate something like reasonable statistics for who actually saw
a page rather than who fetched it.  There are other, minor advantages
like being able to store bits of more pages in the user agent's cache
without incurring lots of transport overhead, but this would apply
only for objects whose MIME type made it reasonable to start
displaying with only partially information (it seems like this could
could be a problem for partial markup even with some psuedo-HTML, by
which I mean Netscapisms like "Javascript").

I dig the motivation, but I suspect it's not the right reason to do
it this way.  First, the statistics on who saw a document are affected
by caches in fairly unpredictable ways, so that fixing this bit does
not really solve the bigger problem.  Second, many other kinds of 
advertiser supported media have the same problem, both for under-reported
and over-reported statistics.  Eventually, the advertisers get a feel
for it, without having to engineer concrete statistics (imagine a TV they
stored the ads until a proximity-based radiant heat sensor indicated
that a human was nearby....).  The Web is young, so it doesn't have that,
but it will get it without our writing support for into basic methods
for http.

What would be needed to fix the basic problem is some confirmation
mechanism, so that web server can see in a concrete way how many times
the page was rendered.  An optional header to do this might work this
way:

Server sends:
Confirm: First

with a document; it is ignored by any intermediate caches and noted by
the user agent.  If the user agent is set to respond to confirms, it 
notes the desire for the confirm and sends the confirmation when it
renders the document.  If Confirm: is set to Each (or every, or some such),
the browser sends a confirmation every time it renders the page.  Make
confirm a method, which takes the same arguments as head, and expects
no reply or some specified Confirmation received response code.  This solves
the basic problem of mega-caches *and* prefetched documents.  It also means
that the change in behavior occurs only when a server admin wants the
confirmation.

You might need to protect privacy by having a user dialog come up when a
confirmation is requested, or have a global setting for those users who
don't care about confirmation.  Still, the thing is pretty doable.

Why aren't we doing it then? My guess is first, because it requires
network resources that don't add anything to the user experience, and
they are the ones with the thinnest pipes; if turning this off gives
any improvement in local performance, it will be turned off.
Secondly, it looks big-brotherish even if the confirmation is for a
site that selling sweet little old lady products like quilts and
doilies; people start to wonder what else is going on in the way of
tracking them (not that they shouldn't wonder).  Thirdly, it means the
client has to keep track of another type of attribute for pages in
user caches, orthoganol to all the current ones.  Maybe it will
come down the pike in the future anyway, depending on how much
content providers want it.  I have not noticed, however, that content
providers have much of a voice in influencing the big dogs (maybe
that's just my experience as someone working with content providing
with public funding, however).

I guess that's more than two cents on this topic, so I'll shut up now.
				Ted Hardie

Note: I do not speak for NASA.
		



Brian Behlendorf writes:
> The second disadvantage you list, I don't have an answer to - yeah, there
> will be no prefetching there simply because it's a (potentially) different
> adminstrative domain.  However, the first problem isn't a problem when you
> compare it to *no* prefetching - sure, a second document request is needed (I
> won't say new tcp connection or round trip because we could be talking
> persistant connections here), but at least you have the first screenful of
> the document to read while the rest is loading, so the *perception* is that
> there was no delay between the "click" and the beginning of the document
> rendering.  Furthermore, you don't have the bandwidth hit of having all pages
> prefetched, only the very beginnings of those pages.  I say make that
> "beginning" mark arbitrary, so server/site authors can configure that on an
> object-by-object basis. 
> 
> If we want to push this "smarts" back to the client, we could have a new
> method, say "TOP", which means "give me the headers and however many bytes of
> content you think I should be able to see before the full request goes
> through".  In a typical persistant HTTP request, it means a GET is placed on
> a document, the document is parsed for IMG and EMBED-ed objects, those are
> fetched using GET, finally the document is parsed for HREF-linked objects,
> and those objects are sent a TOP method.  When an HREF is selected, another
> full request happens just like nowadays, but the browser can render the TOP
> info it got immediately.  Just how many bytes a TOP request returns is left
> up to the server/site author.  Some servers may configure it to be to the
> first <HR> in an HTML doc - others may say the fist 1500 bytes.  The server
> should also have some way of saying "look, the object you wanted was so
> small, I gave you the whole thing anyways". 
> 
> This is academic theory until it's implemented as a test somewhere, so 
> I won't press too much more on it.
> 
> 	Brian
> 
> --=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
> brian@organic.com  brian@hyperreal.com  http://www.[hyperreal,organic].com/
> 
>
Received on Wednesday, 20 December 1995 20:03:00 UTC