Confusion regarding following a link from Taha Masood on 2001-04-20 (ietf-http-wg@w3.org from April to June 2001)

From: Taha Masood <taha.masood@streaming-networks.com>
Date: Fri, 20 Apr 2001 11:37:17 -0700
To: http-wg@hplb.hpl.hp.com
Message-ID: <003a01c0c9c9$031cc340$2b05a8c0@streamingnetworks.com>
Hi folks ,

I have a little confusion regarding HTTP , I would appreciate if someone could help me solve it.
The problem is as follows:

e.g. I give the following request to a browser:
http://directory.google.com/Top/Computers/Algorithms/

it Builds the a request that aprt from other things contains the following in which I am interested now:

GET /Top/Computers/Algorithms/ HTTP/1.1
Host: directory.google.com


Fine , the server responds back and gives an HTML page to me back .
Now I render the HTML to my GUI .
The HTML contains the following link:

<a href="/Top/Science/Math/Applications/Communication_Theory/Cryptography/Algorithms/">Cryptography</a>

Now the confusion is that if  my user "clicks" on the hyperlink given above , what request should I generate :

What I used to do till now was to classify the situation into three portions:

  Whenever we are currently viewing a certain page on the web , and we try 
  to follow a link to another page , there can be three cases. For all the 
  cases , the current page is say : www.abc.com/help/u1/myHelp.html

  FIRST CASE:
  The link I try to follow is : "/yourHelp.com"
  Effective URL should be:
  www.abc.com/help/u1/yourHelp.com

  SECOND CASE:
  The link I try to follow is : "../../TopLevelHelp.com"
  Effective URL should be:
  www.abc.com/TopLevelHelp.com

  THIRD CASE:
  The link I try to follow is : "www.beta.com/OtherHelp.com"
  Effective URL should be:
  www.beta.com/OtherHelp.com

I had implemented a little parsing in my application which works in a way that it is given the URL of the resource currently being displayed and the link which we are trying to follow , which given in the HTML after " <a href= " tag. , and then it returns an Effective URL which actually has to be shown . From that URL , I separate the Host part and the relative part , and  build an HTTP request and pass it on to the server . IT used to work pretty fine till now , but I encountered an error today , that led me to believe that I was probably NOT understanding the things probably.

The problem occurred when I got to the page :

http://directory.google.com/Top/Computers/Algorithms/

The above contains a line in HTML as :
<a href="/Top/Science/Math/Applications/Communication_Theory/Cryptography/Algorithms/">Cryptography</a>

Now when  my user "clicks" on the hyperlink given above , according to my CASES , this thing falls into the FIRST CASE , and what I do is that the EFFECTIVE URL made is:

http://directory.google.com/Top/Computers/Algorithms/Top/Science/Math/Applications/Communication_Theory/Cryptography/Algorithms/

Fine , so I remove the host and relative part and Build the HTTP request :

GET /Top/Computers/Algorithms/Top/Science/Math/Applications/Communication_Theory/Cryptography/Algorithms/  HTTP/1.1
Host: directory.google.com

The server replies that this resource is not there .

When I follow  the same link through MS Internet Explorer , the request it generates is :

GET /Top/Science/Math/Applications/Communication_Theory/Cryptography/Algorithms/ HTTP/1.1
Host: directory.google.com


I fail to understand what are the General Rules for following links ? What portion of the RFC refers to it ?

I would really appreciate if someone could explain this to me .

Thanks in advance ,

Regards,
Taha
Received on Thursday, 19 April 2001 23:46:40 UTC