What prefixes are appropriate with what IDs from Daniel DuBois on 1996-04-25 (www-logging@w3.org from April 1996)

From: Daniel DuBois <ddubois@spyglass.com>
Date: Thu, 25 Apr 1996 14:23:06 -0500
To: www-logging@w3.org
Message-Id: <2.2.32.19960425192306.009bfd04@rafiki>
I'm in the process of starting to again work on the XLF portion of our next
generation server, and I believe I told this group I would analyze the
numerous prefix identifier combos.  Since I have to do it anyway to
implement the log configuration and the logging functionality of the server,
here it goes.

I'm only concerned with the configuration of an origin server at this point.
As such, no prefix that contains a "r" makes any sense (for an origin server
to generate in a log file that is).  Even if my origin server were talking
to a proxy, it wouldn't know it, it would just perceive the requestor as any
other client, so I won't include any combos below that are prefixed by r-,
sr- or rs-.  (See Dan pass the buck.  Pass, Dan, pass!)

c-ip
   This would be the ip address of the client that initiates the connection.
   This is easily available from the whatever your Net_Accept code does.
s-ip
   This I presume is the IP address that this particular connection came in on.
   For HTTP/1.0 servers, this would probably be used to distinguish virtual
   hosts in the log.  (For the Spyglass server it wouldn't be useful since each
   virtual host gets it's own logfile, but to each his own.  I see a #IP
   field directive being more useful for our purposes, and I believe that
   has been discussed?)
cs-ip
sc-ip
   These probably no longer make any sense given the addition of c- and s-.  It
   seems likely to me that the groupings of "prefixable ids" and "nonprefixable
   ids" should be broken up into "ids with direction prefixes", "ids with
   position prefixes", and "nonprefixable ids".

Obviously "c-" and "s-" are positional prefixes, with he others being
"directional".  The terms "positional" and "directional" and almost
certainly the worst anyone could think of, and different ones should be made.

c-dns
   This is what we get when the server does the DNS lookup of the c-ip.
   Given the current DNS logging options available in the Spyglass server, this
   could be '-' all the time, '-' on non-CGI scripts, or never '-'.  I expect
   post-processing analysis tools would generate the values more often than
   origin servers.
s-dns
   This one I'm not so sure about.  It could be just the virtual host name
   the origin server internally associates with the s-ip from above, or it
   could be the contents of the Host: header when 1.1 becomes prevalent.  There
   is the issue of what do we log in the case where those two things are not
   the same.  (There will be a transitional state where people will keep
   multiple IP addresses per machine, using Host: when available, and
   checking the ip address the request came in on when it's not.  In that case
   the request may have come on over the 'main' Ip address, but have a Host:
   that would have indicated the 7th IP address.)  Does it have to be a FQDN
   or can it be a plain hostname like "www" which obviously means something
   different if looked at from outside the scope of the log-generating machine?
cs-dns
sc-dns
   These are some more that probably no longer make any sense given the
   addition of c- and s-.

sc-status
   The return code of the HTTP resposne, like 200 OK.  Simple enough.
c-status
   I don't know what this is suposed to mean.
s-status
   I don't know what this is suposed to mean, uless it's redundant with
   sc-status, in which case it should be a id that takes "directional prefixes".
cs-status
   I don't know what this is suposed to mean.

c-comment
   No clue.  How does a client make a comment that a server can log?
s-comment
   I guess this would be something a server specific application would use, or
   maybe request-based errors, like "This request failed authentication" or
   "This request had a network read failure before it the full request entity
   body was read."  I probably wouldn't generate it, but that's me.
cs-comment
   No clue.  How does a client make a comment that a server can log?
sc-comment
   I don't know what this is suposed to mean.

cs-method
   Obviously POST, GET, etc.
c-method
s-method
sc-method
   I don't think any of these make any sense, except maybe c-method, but even
   that is probably inappropriate because of it's redundancy.

cs-uri
   The request URI.  Could be an absolute URI is your server accepts those.
   Bascially anything that occurs after the first LWS after the method, until
   the next LWS.
c-uri
   Redundant?
s-uri
   I could concieve of this being meaningful if used to indicate some internal
   URI-translating done by the server.  For instance, if to handle virtual
   hosting, a server translates www.joe.com/products/phones.txt to
   /joe/products/phones.txt.  But even that's a strech, and that's more of a
   URI to internal file system mapping issue than anything.  So this is
   probably not sensical.
sc-uri
   I don't know what this would be used for, unless perhaps if you use this
   fields to record Location redirection repeonses, or maybe, Content-Location:
   content negotiated reponses.

These last two provoke the question:  If we have prefix-id combos that could
conceivably mean something to someone, and be useful to them, is it a
requirement that the appropiate use of theses IDs be described in the draft
standard so that log files are understandable by all, not just those
individuals/groups who generated a particular file with the questionable
prefix-id combo?  If we are trying to standardize on IDs, I would think that
requires those ids to have standard meanings.

cs-uri-stem
   Same as cs-uri minus the first '?' and everything after it (if there is a
   '?').  [I could be wrong about minus the '?', the draft doesnt really say
   that].
c-uri-stem
   Redundant?
s-uri-stem
   Same issues as s-uri.  Same dropping the '?' + trailer as cs-uri-stem.
sc-uri-stem
   Same issues as sc-uri.  Same dropping the '?' + trailer as cs-uri-stem.

cs-uri-query
   Same as cs-uri minus everything up to and including the first '?'.  If there
   is no '?' in the URL, or if there is nothing after the '?', I believe this
   entry would be '-'.
c-uri-query
   Redundant?
s-uri-query
   Same issues as s-uri.  Same dropping the stem + '?' as cs-uri-query.
sc-uri-query
   Same issues as sc-uri.  Same dropping the stem + '?' as cs-uri-query.


So to sum up:  If I, as an origin server configuraiton file parser see prefix

1) "c"
       I expect to see next: ip or dns.  Anything else I would flag an
       error or an unsupported feature.
1) "cs"
       I expect to see next: (requestheader), method, uri, uri-stem,
       uri-query.  Anything else I would flag an error or an unsupported
       feature.
1) "s"
       I expect to see next: ip, dns, or comment.  Anything else I
       would flag an error or an unsupported feature.
1) "sc"
       I expect to see next: (responseheader), status, or possibly uri,
       uri-stem, or uri-query.  Anything else I would flag an error or an
       unsupported feature.

Does anyone think I've shortchanged any of the combos above?  Does anyone
disagree that the prefix-id combos should have standardized meanings to be
useful and present in the draft?  (IMO, they should be x-foo if they mean
different things to differnet organizations.)


I've neglected the possible addition of "authname" which would be the
usename in the basic or digest authentication, similiar to the entry in the
CLFF.  I would expect only one of "c-authname" or "cs-authname" to be used.

-----
the Programmer formerly known as Dan          
                                     http://www.spyglass.com/~ddubois/
Received on Thursday, 25 April 1996 15:24:49 UTC