- From: Andrew Daviel <andrew@andrew.triumf.ca>
- Date: Wed, 10 Sep 1997 12:17:40 -0700 (PDT)
- To: www-talk@w3.org
On Wed, 10 Sep 1997, Arnoud Galactus Engelfriet wrote: > In article <34158E2B.41C6@opentext.com>, > George Phillips <phillips@opentext.com> wrote: > > Arnoud Galactus Engelfriet wrote: > > > Webcrawlers most definitely DO NOT assume a filename if a link leads > > > to a 'directory' URL. If the URL is "/foo/bar/" then the client *must* > > > ask for "/foo/bar" and see what it gets back. It doesn't matter at all .. > > Seems to be a rather critical typo here. What you must have meant > > was that if the URL is "/foo/bar/" then the client *must* ask for > > "/foo/bar/" and see what it gets back. Sorry to pick, but that > > missing slash is really important. > > True, of course. I thought this was wrong, but of course it's right ... Typically one asks for "/foo/bar" then gets a redirect to "/foo/bar/". Then (Apache, anyway) one gets sent the content of the DirectoryIndex, such as "/foo/bar/index.html". In this case the browser location box displays "/foo/bar/". I would think that a spider may see "/foo/bar/" and "/foo/bar/index.html" as distinct URLs, unless some scheme to eliminate duplicates is implemented (maybe the big guys do..) The server root on Apache is special; "http://foo.org", "http://foo.org/" amnd (typically) "http://foo.org/index.html" all get the same content without redirection. One can also do silly things like making a directory called "index.html" Andrew Daviel TRIUMF & Vancouver-Webpages
Received on Wednesday, 10 September 1997 15:18:07 UTC