Re: New terminology draft

From: Henrik Frystyk Nielsen (frystyk@w3.org)
Date: Tue, Mar 23 1999


Message-Id: <3.0.5.32.19990323165934.00a87800@localhost>
Date: Tue, 23 Mar 1999 16:59:34 -0500
To: "Lavoie,Brian" <lavoie@oclc.org>, "'www-wca@w3.org'" <www-wca@w3.org>
From: Henrik Frystyk Nielsen <frystyk@w3.org>
Subject: Re: New terminology draft

At 16:12 3/22/99 -0500, Lavoie,Brian wrote:

Hi Brian,

Thanks for taking the lead on this - here are my comments on your March 17,
1999 version:

>   Primitive Elements
>   
>   File
>   A collection of bytes, stored in a static medium, identified by a name
>   and an extension (format).

I don't like using "file" as a unit at all. Resource is really more
flexible and we never really care about how that resource is stored
internally. There is no reason why an object stored in a database can't be
as (or more) static.
   
>   Resource
>   A network-accessible information unit, consisting of one or more
>   files, which are collectively referenced by a Uniform Resource
>   Identifier (URI).

A resource doesn't really consist of one or more files. Have you seen the
definition at

	http://www.w3.org/WCA/1999/01/Terms.html#Resource

I think that is more generic. 

>   Resource Instantiation
>   The state of a resource at a specific point in time and/or from a
>   specific viewpoint.
>   A conceptual mapping exists between a resource and a resource
>   instantiation (or set of instantiations). A resource remains static
>   even when its content i.e., the set of resource instantiations
>   currently prevailing changes over time, provided that the conceptual
>   mapping does not change.
>   Example:
>   A text file containing the previous day's closing price for Microsoft
>   stock is a resource. The version of that file listing the closing
>   price of Microsoft stock on March 15, 1999, is an instantiation of
>   that resource.

For better or worse - HTTP calls an instantiation for an "entity" which
doesn't really say anything. However, I think we are better off sticking to
what people already are familiar with - I have a definition which is a bit
more general than the one HTTP uses (doesn't split "metadata" from "data"
as in entity body and entity header.

	http://www.w3.org/WCA/1999/01/Terms.html#Entity

>   Client
>   A software application that initiates network communication.

I think that we here can be a little more specific, see for example:

	http://www.w3.org/WCA/1999/01/Terms.html#Client1

>   Server
>   A software application that waits for network communication to be
>   initiated.

and here as well:

	http://www.w3.org/WCA/1999/01/Terms.html#Server1

The important thing is that "server" also covers the term "proxy" and other
intermediaries.
   
>   Message
>   A unit of communication exchanged between two peers (i.e., two units
>   residing at the same network layer).
>   
>   Request
>   A message containing an atomic operation to be carried out in the
>   context of a specified resource.
>   
>   Response
>   Zero, one or more messages containing the result of an executed
>   request.

I propose using these definitions instead for message, request, and response:

	http://www.w3.org/WCA/1999/01/Terms.html#Message
	http://www.w3.org/WCA/1999/01/Terms.html#Request
	http://www.w3.org/WCA/1999/01/Terms.html#Response

as they add a little more text on how requests and responses interact.

>   User
>   A human using a client to manually (interactively) retrieve
>   network-accessible resources.

It is inherent in the Web model that a user agent always issues requests on
behalf of some human although it doesn't have to be directly. For example,
a robot still behaves on behalf of the human starting it. I wouldn't make
it a primitive. Instead I would like to define certain access patterns -
there is no reason why a browser can't become a robot while filling a cache
or using a robot to behave as a browser to download inlined images etc.

Instead I think we need to define the term "web page":

	http://www.w3.org/WCA/1999/01/Terms.html#page

and "Web site" as well:

	http://www.w3.org/WCA/1999/01/Terms.html#site

Both of my definitions are fairly close to yours further down in your list.

>   The Scope of the Web
>   
>   Web Resource
>   A resource that is accessible from the Internet, via the HTTP
>   protocol.

The Web is really not limited to HTTP (nor HTML/XML for that matter). Those
are just popular ways of implementing the Web - the Web is really the
complete information space that can be referenced by URIs. That is,
anything that is a resource is on the Web. When you think about it, this is
really not limited to networked resources - however, this is how we
normally think about them. Examples of non-networked URIs are phone
addresses, for example:

	http://www.w3.org/Addressing/schemes.html#phone

As we have already defined a "resource", we don't have to change that
definition.
   
>   Web-accessible Internet Resource
>   A resource, accessible from the Internet through a non-HTTP network
>   protocol, that is referenced by a hyperlink embedded in a Web
>   resource.
>   
>   Note that the definitions of Web resources and Web-accessible Internet
>   resources both stipulate that the resource is available on the
>   Internet. This is intended to exclude networks not connected to the
>   Internet, such as non-TCP/IP networks, corporate intranets, and other
>   private networks.
>   
>   The Web-accessible Internet resource definition addresses the fact
>   that HTML, a key standard for Web resources, permits the direct
>   linkage of non-HTTP-accessible resources from HTTP-accessible
>   resources.

There are plenty of other formats that contain URIs - pdf, powerpoint, etc.
It is not limited to HTML. Again, I don't think we have to say anything
more than what we already have on resources.
  
>   Web Clients
>   
>   Web Client
>   A client that can be used to access Web resources.

We have already defined this as a "client" - it doesn't matter what
protocol it is really speaking nor whether it has a human clicking on the
mouse.
   
>   Click
>   A request by a user for the contents of a Web resource, identified by
>   a URL. A click can take one of two forms:
>   Explicit click: A click that is initiated manually by the user.
>   Implicit click: A click that is initiated transparently by the client,
>   without manual intervention on the part of the user, as an ancillary
>   event corresponding to an    explicit click.

Instead of defining "click" (which is also not very general - there are
many other ways of initiating a request) then I think we are in fact
already covered by the "web page" definition where we leave it to the user
preferences and/or application capabilities to decide which links are
dereferenced and which are not. 
   
>   Click-through Rate
>   Frequency with which a Web resource, identified by a URL, is clicked.

Do you mean "Web page access rate"? That is, the (mean or distribution?)
time between changing web pages? Again, I think we should avoid the term
"click".
   
>   User Session
>   A cohesive set of user clicks across one or more Web servers.

What about "A set of Web pages accessed by continuous dereferencing of
links contained within these web pages. A session is not limited to a
single Web site"?
   
>   Episode
>   A subset of related user clicks that occur within a user session.

How is that distinguished from a session?
   
>   Temporal Session Length
>   The amount of time that elapses during the course of a user session.
>   
>   Session Path Length
>   The number of clicks that occur during the course of a user session.
>   
>   Server Session
>   A collection of user clicks to a Web server during a user session.
>   Also called a visit.

What about:

	The part of a user session limited to a single Web site.
   
>   Server Path Length
>   The number of user clicks to a Web server during a user session.

The following definitions are rather specific to HTTP - so maybe we should
introduce a special section of HTTP "stuff" including the equivalent sizes
for responses?
   
>   Client Request Header Size
>   The number of bytes in the HTTP header sent by a client requesting a
>   Web resource.
>   
>   Client Request Content Size
>   The number of bytes sent by a client delivering content to a Web
>   server (e.g., the content of a "PUT" request).
>   
>   Total Client Request Size
>   Client Request Header Size + Client Request Content Size
>   
>   
>   
>   Web Servers

I think we should put this under an "HTTP section" as well - we should
really keep the definition of a web site separate from HTTP - especially as
HTTP will change over time.
    
>   Web Server
>   A server that provides access to Web resources.
>
>   Server Response Header Size
>   The number of bytes transferred by a server in delivering an HTTP
>   header, in response to a client request for a Web resource.
>   
>   Server Response Content Size
>   The number of bytes transferred by a server in delivering the content
>   of a requested Web resource.
>   
>   Total Server Response Size
>   Server Response Content Size + Server Response Header Size
>   
>   Cookie
>   Data sent by a Web server to a Web client, to be stored locally by the
>   client and sent back to the server on subsequent requests.

I think I'll stop here for the first round of comments - maybe we can start
off discussing these first?

Henrik
--
Henrik Frystyk Nielsen,
World Wide Web Consortium
http://www.w3.org/People/Frystyk