Re: Web growth
From: Jim Pitkow (pitkow@parc.xerox.com)
Date: Thu, Mar 18 1999
Date: Thu, 18 Mar 1999 18:48:48 PST
To: Joe Meadows <meadowsj@nobs.ca.boeing.com>, www-wca@w3.org
From: Jim Pitkow <pitkow@parc.xerox.com>
Message-Id: <99Mar18.184954pst."147486"@mailback.parc.xerox.com>
Subject: Re: Web growth
Points well taken. Maybe we should recast the term size in this context to
"robot reachable".
At 02:44 PM 3/18/99 , Joe Meadows wrote:
>>The presence of robots.txt is quite low actually, though I don't have an
>>exact number (but will try to get one). Either way, the notion of size for
>>me centers around what is publicly accessible, so robots.txt should not
>>influence this much.
>
>Just because something is "blocked" by robots.txt doesn't mean it isn't
>publically accessible. In fact, it probably is, otherwise the content
provider
>probably wouldn't bother putting it in, they simply don't think the material
>should be indexed by a crawler. I would imagine they make up a pretty small
>percentage of pages though on the internet at large.
>
>On our intranet however, we have some sites that use a robots.txt file to
>prevent the whole site from being indexed (preferring to ... coerce/guide
users
>through their entry way, i.e. trying to be a portal). 50% might not actually
>be unrealistic internally...
>
>Cheers,
>Joe Meadows
>