Re: Web growth
From: Johan Hjelm (hjelm@w3.org)
Date: Wed, Mar 17 1999
Message-Id: <4.1.19990317214105.00c1bca0@127.0.0.1>
Date: Wed, 17 Mar 1999 21:47:51 +0100
To: Jim Pitkow <pitkow@parc.xerox.com>
From: Johan Hjelm <hjelm@w3.org>
Cc: "Lavoie,Brian" <lavoie@oclc.org>, "'www-wca@w3.org'" <www-wca@w3.org>
Subject: Re: Web growth
Two factors may affect this calculation:
1: The definition of pages. How do we characterise a page that is generated
in response to a cookie set at an earlier visit when I come back ("Welcome
back Johan Hjelm")? Is it one of the 700 million, even if it didn't exist
before and never will again? How do we account for frames? (well, these are
my common gripes)
2: The domain investigated. Do you take intranets into account? In that
case, we may underestimate it. Does the Alexa robot respect robot.txt and
robot metatags (I think I remember it does)? In that case, is it reasonable
to expect that 50 % of all publicly accessible pages are on servers that
restrict access? It may be high - but not unreasonable.
It seems to me we need to investigate these aspects before we can say
anything definite.
Johan
At 12:36 1999-03-17 -0800, Jim Pitkow wrote:
>
>Yeah, that's the trouble fitting three data points. The latest I heard
>from Alexa was that they've got around 200-300 million pages during their
>last crawl, so 700 million seems a bit high.
>
>At 10:48 AM 3/17/99 , Lavoie,Brian wrote:
>>Ed and I did some back-of-the-envelope calculations in regard to the growth
>>numbers Jim posted:
>>
>>We fitted three different trendlines (power, linear, and exponential)
>>through the three data points from Compaq SRC for the number of Web pages.
>>Interestingly, the R-squared for each was about the same, although the
>>exponential had the best fit (use 120 as the scalar, 0.0829 as the growth
>>rate, in terms of months). Using the exponential trend and extrapolating to
>>Mar. 99 suggests there are about 743 million Web pages currently. Is this
>>figure plausible? Well, in July 1998, Vinton Cerf estimated there were about
>>350 million pages, so given the above extrapolation, in 8 months the number
>>of Web pages would have doubled, which is pretty close to the doubling rate
>>Jim estimated. So there may in fact be about three-quarters of a billion Web
>>pages out there now.
>>
>>Brian Lavoie
>>OCLC
>>
************************************************************
Johan HJELM
Ericsson Research, User Applications Group
Currently visiting engineer at the W3C
The World Wide Web Consortium
hjelm@w3.org
http://www.w3.org/People/W3Cpeople.html#Hjelm
Fax +1-617-258 5999, Phone +1-617-263-9630
MIT/LCS, 545 Tech. Sq. Cambridge MA 02139 USA
opinions are personal, always my own,
and not necessarily those of Ericsson or the W3C.
============================================================