<BASE> processing by browsers

Grzegorz Staniak (GSTANIAK@golem.umcs.lublin.pl)
Sat, 29 Apr 1995 20:58:54 +0100


From: "Grzegorz Staniak" <GSTANIAK@golem.umcs.lublin.pl>
To: www-html@www10.w3.org
Date:          Sat, 29 Apr 1995 20:58:54 +0100
Subject:       <BASE> processing by browsers
Message-Id: <93C3A768A0@golem.umcs.lublin.pl>


Just a couple of thoughts as concerns the <BASE> tag. 

I use the tag arbitrarily, i.e. do not always make it include the 
actual URL of the document that contains the tag - sometimes the tag 
points to a directory a level or two above the document, which allows 
me to easily refer to other documents in neighbouring directories, 
like this:

(for a URL like "http://my.www.server/foo/bar/my_document.html")

         <BASE HREF="http://my.www.server/foo">
         ...
         <A HREF="/bara/another_document.html">
         <A HREF="/barb/yet_another.html">
         <A HREF="/barc/and_one_more.html">

This way, there's no URL of the retrieved document anywhere in its 
content; however, the links are put in a context and after saving and 
opening locally such a file is still fully functional - every link works.

The problem is that a number of browsers know better what the value 
of my <BASE> tag should be, i.e. they assume that it must contain the 
URL of my document. Not surprisingly, while trying to follow any of 
the links from the example above you're going to see the 404 "Not 
Found" error messages, giving URLs like:
     
      "http://my.www.server/foo/bar//bara/another_document.html", 

which, I agree, do not exist.

My point is that this is a mistake on the side of browser developers. 
There's nothing in the HTML 3.0 specs (or HTML 2.0 for that matter) 
that would prevent the author from aribtrary use of the tag. If HTML 
3.0 mentions that the default BASE is the URL of the document itself, 
then talking of defaults only makes sense if you're allowed to 
override them, doesn't it.

The internet draft on Relative Uniform Resource Locators, <draft-ietf-uri-
relative-url-06.txt> proposes another way of doing what I do: using 
"." and ".." in the relative path, like:

       <BASE HREF="http://my.www.server/foo/bar/my_document.html">
       ...
       <A HREF="../bara/another_document.html">

but in section 3 "Establishing a base URL" it stresses that the base 
URL embedded in the document's content should take priority, while 
parsing, over any other way of establishing it. 

I have a feeling that at a time multiple <BASE> tags were proposed, 
to serve more or less the same function as my arbitrary <BASE>. It 
seems this has been dropped since then.

Perhaps the issue is not that important, but more widespread use of 
the <BASE> and relative URLs would make saved files more useful - 
very often I save an interesting page only to see, after opening it 
locally, that it's full of relative URLs but has no <BASE> tag, and 
all the links are useless.


-------------------------------
Grzesiek Staniak
<gstaniak@golem.umcs.lublin.pl>
<gstaniak@galen.imw.lublin.pl>