Re: Web site definition

From: Jim Pitkow (pitkow@parc.xerox.com)
Date: Fri, Jan 08 1999


Message-Id: <4.1.19990108000005.0094e830@mailback.parc.xerox.com>
Date: Fri, 8 Jan 1999 00:10:39 PST
To: "Lavoie,Brian" <lavoie@oclc.org>, "'www-wca@w3.org'" <www-wca@w3.org>
From: Jim Pitkow <pitkow@parc.xerox.com>
Subject: Re: Web site definition

At 02:12 PM 1/4/99 , Lavoie,Brian wrote:
>In my opinion, the only unambiguous way to define a Web site is as the set
>of all Web pages (and I think your definition of this concept is a very good
>one) that share a common host, defined by either the IP address, or
>preferably, the domain name, if it exists. In this sense, any Web page can
>be assigned to a unique Web site based only on information contained in the
>URL. Perhaps this is what you meant in your definition.

I agree that the definition was too lose and I like yours better.  Still,
there are tough cases like domains with many sub-domains that also load
balance servers, e.g., automatically determining that w5.ibm.com and
wo1.ibm.com different web sites (or are they the same?).  If the domain
name is different but the IP is the same, do you consider it the same site
(e.g., virtual hosting)?

It may be worth enumerating various cases and providing our default
counting method for each.  Care to take a stab at it?

>Our research group has done some thinking about appropriate terminology for
>Web entities. Some of this was presented at the workshop in November. If
>interested, please visit:
>http://www.oclc.org/oclc/research/projects/webstats/taxonomy.htm

Thanks for the pointer - I'll try to work those definitions in.

Jim.