Re: partial URLs ? (was <p> ... </p>)

Arjun Ray (aray@pipeline.com)
Wed, 20 Dec 1995 22:41:38 -0500 (EST)


Date: Wed, 20 Dec 1995 22:41:38 -0500 (EST)
From: Arjun Ray <aray@pipeline.com>
Subject: Re: partial URLs ? (was <p> ... </p>)
To: John Franks <john@math.nwu.edu>
Cc: www-html@w3.org
In-Reply-To: <199512210159.TAA18610@hopf.math.nwu.edu>
Message-Id: <Pine.3.89.9512202258.A3574-0100000@alpha>



On Wed, 20 Dec 1995, John Franks wrote:

> According to Arjun Ray:
> > 
> > On Wed, 20 Dec 1995, Daniel W. Connolly wrote:
> > 
> > > 	http://www.foo.com/a/b/../gifs/btnhome3.gifs
> > > 
> > > (which is _not_ a well-formed HTTP url) and send:
> > > 
> > > 	GET /a/b/../gifs/btnhome3.gif HTTP/1.0
> > > 
> > > This is illegal because it is a potential secruity risk. 

> > I think this is illegal simply because it's not a well-formed URL. The 
> > question, then, is what the server should do about it.
> 
> As I recall the draft RFC for URL's specifies that certain characters
> (like space) are forbidden, certain (like '?') have special meaning
> and otherwise the "path" part of a URL is an opaque string (which, in
> particular, may have nothing to do with a path).  Neither '/' nor '.'
> are forbidden or have special meaning.  They do have special meaning
> *for some implementations* and no special meaning for others.
> Likewise the colon may have special meaning for some implementations
> and not for others.
> 
> The fact that certain strings may represent securtity risks for
> some implementations does not automatically make them illegal.
> I don't believe that "/../" is forbidden in HTTP URL's.  If
> I am wrong I would be interested in a reference. 

I confess that I've been relying on memory more than I should have. I was 
going on impressions that I had gathered in early '94, when what seemed 
like The Final Word(tm) was the latest draft of TBL's URI spec -- now RFC 
1630. Here are some excerpts from the RFC version that sorta ring bells: 

---8<---
[Page 5]

  PATH 
   
      The rest of the URI follows the colon in a format depending on the
      scheme. The path is interpreted in a manner dependent on the
      protocol being used.  However, when it contains slashes, these 
      must imply a hierarchical structure. 

[Page 6]

   HIERARCHICAL FORMS
      
      The slash ("/", ASCII 2F hex) character is reserved for the
      delimiting of substrings whose relationship is hierarchical.  This
      enables partial forms of the URI.  Substrings consisting of single
      or double dots ("." or "..") are similarly reserved.

[Page 8-9]

Partial (relative) form
      
   Within a object whose URI is well defined, the URI of another object
   may be given in abbreviated form, where parts of the two URIs are the
   same. This allows objects within a group to refer to each other
   without requiring the space for a complete reference [...] It must be 
   emphasized that when a reference is passed in anything other than a
   well controlled context, the full form must always be used.

   In the World-Wide Web applications, the context URI is that of the 
   document or object containing a reference.
   [...]
   The partial form relies on a property of the URI syntax that certain
   characters ("/") and certain path elements ("..", ".") have a 
   significance reserved for representing a hierarchical space, and must
   be recognized as such by both clients and servers.
   [...]
   The rules for the use of a partial name relative to the URI of the
   context are: 
   [ on grafting the partial onto the base and then ]
      Within the result, all occurrences of "xxx/../" or "/." are
      recursively removed, where xxx, ".." and "." are complete path
      elements.
---8<---

I understood this to mean that for HTTP urls, ".." and "." were 
"reserved" for hierarchy-related specifications of *partial* urls, which 
when resolved in context would be removed to form a complete url, which 
in turn was what I've always understood by "absolute path" in the HTTP spec.

As others have pointed out, the BNF doesn't forbid ".." in a url, but I 
read that as a matter of lexical legitimacy. Before it can be a 
"semantically" valid url, however, we have this "recursively removed" bit 
quoted above. 

To add more confusion, the language suggests that both clients and servers
should grok this. Except, how is the server supposed to know the "context
URI" that allowed the partial forms? Back then, I concluded (unwarrantedly
as it now appears) that this normalization was an implied client-side
requirement deducible from the other parts of various specs considered in
their interaction. 

And more: the later documents -- RFC 1738, RFC 1808, revisions of the 
HTTP spec -- are much more circumspect about the (rather unabashed) 
UN*X-isms in RFC 1630. The notion of "Reserved for hierarchical semantics" 
has indeed vanished. Sigh.

> It would, of course, be quite reasonable for the HTTP spec to have
> a UNIX-centric warning to implementors that they should make this
> string illegal for their implementation (or risk the consequences).

Agreed.


Regards,

Arjun