- From: Arjun Ray <aray@pipeline.com>
- Date: Wed, 20 Dec 1995 22:41:38 -0500 (EST)
- To: John Franks <john@math.nwu.edu>
- Cc: www-html@w3.org
On Wed, 20 Dec 1995, John Franks wrote: > According to Arjun Ray: > > > > On Wed, 20 Dec 1995, Daniel W. Connolly wrote: > > > > > http://www.foo.com/a/b/../gifs/btnhome3.gifs > > > > > > (which is _not_ a well-formed HTTP url) and send: > > > > > > GET /a/b/../gifs/btnhome3.gif HTTP/1.0 > > > > > > This is illegal because it is a potential secruity risk. > > I think this is illegal simply because it's not a well-formed URL. The > > question, then, is what the server should do about it. > > As I recall the draft RFC for URL's specifies that certain characters > (like space) are forbidden, certain (like '?') have special meaning > and otherwise the "path" part of a URL is an opaque string (which, in > particular, may have nothing to do with a path). Neither '/' nor '.' > are forbidden or have special meaning. They do have special meaning > *for some implementations* and no special meaning for others. > Likewise the colon may have special meaning for some implementations > and not for others. > > The fact that certain strings may represent securtity risks for > some implementations does not automatically make them illegal. > I don't believe that "/../" is forbidden in HTTP URL's. If > I am wrong I would be interested in a reference. I confess that I've been relying on memory more than I should have. I was going on impressions that I had gathered in early '94, when what seemed like The Final Word(tm) was the latest draft of TBL's URI spec -- now RFC 1630. Here are some excerpts from the RFC version that sorta ring bells: ---8<--- [Page 5] PATH The rest of the URI follows the colon in a format depending on the scheme. The path is interpreted in a manner dependent on the protocol being used. However, when it contains slashes, these must imply a hierarchical structure. [Page 6] HIERARCHICAL FORMS The slash ("/", ASCII 2F hex) character is reserved for the delimiting of substrings whose relationship is hierarchical. This enables partial forms of the URI. Substrings consisting of single or double dots ("." or "..") are similarly reserved. [Page 8-9] Partial (relative) form Within a object whose URI is well defined, the URI of another object may be given in abbreviated form, where parts of the two URIs are the same. This allows objects within a group to refer to each other without requiring the space for a complete reference [...] It must be emphasized that when a reference is passed in anything other than a well controlled context, the full form must always be used. In the World-Wide Web applications, the context URI is that of the document or object containing a reference. [...] The partial form relies on a property of the URI syntax that certain characters ("/") and certain path elements ("..", ".") have a significance reserved for representing a hierarchical space, and must be recognized as such by both clients and servers. [...] The rules for the use of a partial name relative to the URI of the context are: [ on grafting the partial onto the base and then ] Within the result, all occurrences of "xxx/../" or "/." are recursively removed, where xxx, ".." and "." are complete path elements. ---8<--- I understood this to mean that for HTTP urls, ".." and "." were "reserved" for hierarchy-related specifications of *partial* urls, which when resolved in context would be removed to form a complete url, which in turn was what I've always understood by "absolute path" in the HTTP spec. As others have pointed out, the BNF doesn't forbid ".." in a url, but I read that as a matter of lexical legitimacy. Before it can be a "semantically" valid url, however, we have this "recursively removed" bit quoted above. To add more confusion, the language suggests that both clients and servers should grok this. Except, how is the server supposed to know the "context URI" that allowed the partial forms? Back then, I concluded (unwarrantedly as it now appears) that this normalization was an implied client-side requirement deducible from the other parts of various specs considered in their interaction. And more: the later documents -- RFC 1738, RFC 1808, revisions of the HTTP spec -- are much more circumspect about the (rather unabashed) UN*X-isms in RFC 1630. The notion of "Reserved for hierarchical semantics" has indeed vanished. Sigh. > It would, of course, be quite reasonable for the HTTP spec to have > a UNIX-centric warning to implementors that they should make this > string illegal for their implementation (or risk the consequences). Agreed. Regards, Arjun
Received on Wednesday, 20 December 1995 22:41:48 UTC