Re: URL specification: referring to the current directory. from Andrew Daviel on 1997-09-10 (www-talk@w3.org from September to October 1997)

From: Andrew Daviel <andrew@andrew.triumf.ca>
Date: Wed, 10 Sep 1997 12:17:40 -0700 (PDT)
To: www-talk@w3.org
Message-ID: <Pine.LNX.3.95.970910120425.13950I-100000@andrew.triumf.ca>

On Wed, 10 Sep 1997, Arnoud Galactus Engelfriet wrote:

> In article <34158E2B.41C6@opentext.com>,
> George Phillips <phillips@opentext.com> wrote:
> > Arnoud Galactus Engelfriet wrote:
> > > Webcrawlers most definitely DO NOT assume a filename if a link leads
> > > to a 'directory' URL. If the URL is "/foo/bar/" then the client *must*
> > > ask for "/foo/bar" and see what it gets back. It doesn't matter at all
..
> > Seems to be a rather critical typo here.  What you must have meant
> > was that if the URL is "/foo/bar/" then the client *must* ask for
> > "/foo/bar/" and see what it gets back.  Sorry to pick, but that
> > missing slash is really important.
> 
> True, of course. 

I thought this was wrong, but of course it's right ...  Typically one asks
for "/foo/bar" then gets a redirect to "/foo/bar/".  Then (Apache, anyway)
one gets sent the content of the DirectoryIndex, such as
"/foo/bar/index.html". In this case the browser location box displays
"/foo/bar/".

I would think that a spider may see "/foo/bar/" 
and "/foo/bar/index.html" as distinct URLs, unless some scheme to
eliminate duplicates is implemented (maybe the big guys do..)

The server root on Apache is special; "http://foo.org", "http://foo.org/"
amnd (typically) "http://foo.org/index.html" all get the same content
without redirection.

One can also do silly things like making a directory called "index.html"

Andrew Daviel
TRIUMF & Vancouver-Webpages

Received on Wednesday, 10 September 1997 15:18:07 UTC