Re: Common Log format

> I have a serious problem with the Common Logfile format, as presented at 
> <URL:http://w3.org/hypertext/WWW/Daemon/User/Config/Logging.html#
> common-logfile-format>.  It indicates that the "request" portion of the 
> log entry should be:
> 
>   The request line exactly as it came from the client.

Yes -- that is what a log is for.

> Unfortunately with directory indexing, this means that three different 
> requests all have the same semantic meaning:
> 
>   GET /dirname
>   GET /dirname/
>   GET /dirname/index.html
> 
> (Assuming that index.html is the dir index file, this too can vary.) Are 
> the current logfile processing programs taking this vagarity into 
> account?

Yes, it is a trivial thing to do -- wwwstat has done it since v0.1.

> I intend to log
> 
>   GET /dirname/index.html
> 
> in all cases where index.html existed, and
>   
>   GET /dirname/
> 
> in all cases where it doesn't, unless somebody can provide me with a 
> really good reason not to. 

Reason: it would by lying -- that is not the request it got, so it shouldn't
be logging it as if it was.  For instance, I am usually interested in cases
where there are a large number of requests for

    GET /dirname

since that usually means somebody has advertized (or included as a link)
the wrong URL for that dirname.  Your scheme would prevent me from finding
those cases in the logfile.

> One of the features of the server I am 
> writing will be reliable logging, so this is a little more important than 
> it might sound.

In that case, don't do it -- you just introduced an unreliability.
If the server mucks with the request, I can't rely on it for maintenance
and security checks.

 ....Roy T. Fielding  Department of ICS, University of California, Irvine USA
                                       <fielding@ics.uci.edu>
                      <URL:http://www.ics.uci.edu/dir/grad/Software/fielding>

Received on Friday, 14 April 1995 04:41:25 UTC