[Bug 7918] prefetching: allow site to deny

http://www.w3.org/Bugs/Public/show_bug.cgi?id=7918


Nick Levinson <Nick_Levinson@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WONTFIX                     |




--- Comment #3 from Nick Levinson <Nick_Levinson@yahoo.com>  2009-10-23 08:18:14 ---
I'm reopening.

Use case additions:

--- Host bandwidth demand may nearly double, as with Bugzilla, says Mozilla in
"[w]e found that some existing sites utilize the &lt;link
rel=&quot;next&quot;&gt; tag with URLs containing query strings to reference
the next document in a series of documents. Bugzilla is an example of such a
site that does this, and it turns out that the Bugzilla bug reports are not
cachable, so prefetching these URLs would nearly double the load on poor
Bugzilla! It's easy to imagine other sites being designed like Bugzilla . . .
." https://developer.mozilla.org/en/Link_prefetching_FAQ (characters replaced
by entities by me). Google agrees on the concept in saying, "Your users
probably have other websites open in different tabs or windows, so don't hog
all of their bandwidth. A modest amount of prefetching will make your site feel
fast and make your users happy; too much will bog down the network and make
your users sad. Prefetching only works when the extra data is actually used, so
don't use the bandwidth if it's likely to get wasted."
<http://code.google.com/speed/articles/prefetching.html>. See also "[s]ince
Fasterfox constantly requests new files, it can cause many servers to overload
much faster" via the Skatter Tech link, below.

--- URLs with query strings may yield uncachable pages, making prefetching them
useless. Mozilla link, above.

--- Caching is increased, which is demanding of hardware. This is inferred from
a 2001 paper from University of Texas at Austin,
<http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6TYP-444F9RV-1&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1060473355&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=c95e0ad0d99e082a730536d8f5b1937c>.
I'm unclear on what happens if a user's low-capacity computer tries to prefetch
a large file it can't hold, especially when it already has important content in
memory or on disk.

--- Visitors' bandwidth: Some visitors apparently use ISPs who charge users for
bandwidth (see megaleecher.net link below), and erroneous prefetches cost them
more for visiting. That's separate from site owners' bandwidth costs.

--- Slowing: If a user has several downloads working at once, prefetching adds
an unannounced burden that can noticeably slow everything. See the Google link,
above.

--- Benchmarking is available for a limited and possibly non-Web case. IEEE
says, "[u]nfortunately, many LDS ["[l]inked data structure"] prefetching
techniques 1) generate a large number of useless prefetches, thereby degrading
performance and bandwidth efficiency, 2) require significant hardware or
storage cost, or 3) when employed together with stream-based prefetchers, cause
significant resource contention in the memory system." IEEE benchmarks a
proposed hardware-based alternative as that "[e]valuations show that the
proposed solution improves average performance by 22.5% while decreasing memory
bandwidth consumption by 25% over a baseline system that employs an effective
stream prefetcher on a set of memory- and pointer-intensive applications."
<http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4798232>, abstract.

--- Staleness, if prefetching is too early, is cited in the Google article
linked to above.

--- If the current page is still downloading, prefetching the next can slow the
current page's download. "It's also important to be careful not to prefetch
files too soon, or you can slow down the page the user is already looking at."
(per the Google article linked to above).

--- When a site owner restricts the number of times a file may be downloaded,
prefetching causes the threshold to be exceeded too soon
(http://forums.mozillazine.org/viewtopic.php?f=8&t=372533&start=0). 

--- Security re HTTPS: Mozilla says, re Firefox, "https:// . . . URLs are never
prefetched for security reasons[.]" Mozilla link, above.

--- Security has been criticized, re cookies from unexpected sites, but that
can be solved by turning cookies off either generally or from third-party
sites, so I don't know if that's critical. It's discussed in the Mozilla page
linked to above and the MegaLeecher comment linked to below.

--- Security in retrieving from a dangerous site via a safe site might result
in caching a page that has a dangerous script with the user not knowing.
Mozilla disagrees ("[l]imiting prefetching to only URLs from the the same
server would not offer any increased browser security", via the above link),
but I'm unclear why.

Implementations:

--- Firefox already allows turning prefetching off
(<http://www.megaleecher.net/Firefox_Prefetch_Trick>, the article and the sole
comment; see also the Mozilla link above).

--- Gnome's Epiphany browser reportedly does the same
(http://ubuntu-tutorials.com/2008/03/20/how-to-disable-prefetching-in-firefox-epiphany/).

--- IE7 reportedly does the same, with difficulty (see the MegaLeecher link,
above).

--- Blocking at the site is implemented using robots.txt against Fasterfox, a
Firefox extension, and the extension checks the robots file
(http://skattertech.com/2006/02/how-to-block-fasterfox-requests/). (I prefer
not using a robots-sensitive grammar, since that conflates different problems
into the same solution.)

--- Advice to return a 404 error when an X-moz: prefetch is found in the
headers is attributed to Google and is in
<http://www.vision-seo.com/search-engine-optimization/google.html#prefetchblock>,
but that's based on a Mozilla instruction, which may not be stable (see the
Mozilla page linked to above).

Rationale:

--- More UAs would do this if it's in the HTML5 spec. HTML5 should have it for
the use case.

--- As the vast majority of sites would not want prefetch denial, the spec
including it would not alter page authoring or burden UA prefetching,
especially since, presumably, not prefetching is easier for a UA than
prefetching.

All URLs in this post were accessed today.


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 23 October 2009 08:18:19 UTC