Re: More test cases to be confirmed from Graham Klyne on 2004-02-20 (uri@w3.org from February 2004)

From: Graham Klyne <GK@ninebynine.org>
Date: Fri, 20 Feb 2004 09:04:37 +0000
To: uri@w3.org
Message-Id: <5.1.0.14.2.20040220083230.02bd44d0@127.0.0.1>
[Roy:  if there's any consensus to change anything here, I'd be happy to 
draft a replacement for the algorithm in section 5.2.4 based on my current 
implementation.]

At 03:38 20/02/04 +0000, you wrote:

>Graham Klyne <gk@ninebynine.org> wrote:
>
> > (3)
> > Base:   "foo:a"
> > Ref:    "../b/c"
> > Result: "foo:/b/c"
> > (based on bullet 2 of section 5.2.4)
> >
> > I think it would be more consistent (and it's also what software I 
> wrote in
> > the past does) in this case to return:
> >
> > Result: "foo:../b/c"
>
>For what it's worth, RFC-2396 allows either result (and also allows the
>implemention to balk).  See section 5.2 item 6g:

Yes... my earlier implementation used that lattitude.

>Apparently the intention of the new draft is to revoke this
>indeterminacy and settle on a single behavior (discard initial ".."),
>but of course there will be existing implementations that made a
>different choice.  Discarding initial ".." is consistent with typical
>Unix behavior: /../../../etc/passwd is equivalent to /etc/passwd on most
>Unix-like platforms.

I guess I prefer the RFC2396 approach, but I recognize the desire to remove 
indeterminacy.  But now that RFC2396bis explicitly allows results of 
base+relative resolution that look like relative directory paths, I think 
it is more consistent to allow leading '../' segments.

I didn't know about the Unix behaviour.  If we stick with the current 
choice, and that's the reason for it, I'd suggest making it explicit;  e.g. 
somewhere in section 5.2.4:
[[
The effect of this algorithm is to ignore any leading '../' segments in a 
path, similar to the typical behaviour of many Unix systems when dealing 
with such file paths.
]]

> > (4)
> > Base:   "foo:a"
> > Ref:    "./b/c"
> > Result: "foo:b/c"
> >
> > This is not strictly according to RFC2396bis, which I think would have
> > the value returned be:
> >
> > Result: "foo:/b/c"
> >
> > I think "./b/c" should be treated as equivalent to "b/c"
>
>I agree that the current draft would have the result be "foo:/b/c", and
>I fully agree that "./b/c" should be treated as equivalent to "b/c",
>especially since we need to use this equivalence when a segment contains
>a colon.  Here's a motivating example:
>
>The document foo:a wants to link to the document foo:b:c/d.  It can't
>use the relative reference "b:c/d" because the b: looks like a scheme.
>If it uses the absolute reference "foo:b:c/d" then it won't be easily
>movable to another scheme (like foos, the secure version of foo).  What
>it really wants to use is "./b:c/d" (as suggested in RFC-2396), but
>the current draft will resolve that to "foo:/b:c/d", which is not the
>target.
>
>I think the problem is in remove_dot_segments.
>
>Here's another surprising result of that algorithm:
>
>Base:   "foo:a/b"
>Ref:    "../c"
>Result: "foo:/c"
>
>I would expect the result "foo:c", because "../c" is supposed to
>be a sibling of my parent, and the parent of foo:a/b is foo:a, and
>intuitively foo:c is a sibling of foo:a, but foo:/c is not.  The
>RFC-2396 algorithm would produce "foo:c", but admittedly that algorithm
>was intended only for base paths that begin with slash.
>
>Another surprising result:
>
>Base:   "foo:a"
>Ref:    "foo:."
>Result: "foo:."

I currently have:

Base:   "foo:a"
Ref:    "."
Result: "foo:"

>I would expect the result "foo:" (empty path).
>
>Similarly:
>
>Base:   "foo:a"
>Ref:    "foo:.."
>Result: "foo:.."

and...

Base:   "foo:a"
Ref:    ".."
Result: "foo:.."

>These last two cases are the only cases where remove_dot_segments fails
>to live up to its name.
>
>Not surprisingly, the strange cases all involve base paths that don't
>begin with slash, which was never considered in RFC-2396.
>
>A slash at the beginnig of a path is not like the other slashes, because
>usually it is not there by choice, but is obligated to be present to
>separate the first segment from the authority.  The remove_dot_segments
>procedure needs to preserve the separator: it's not safe for the output
>path to begin with a non-slash unless the input path begins with a
>non-slash.

Yes, I noticed that in my code:  I apply the dot-segment-normalization to 
the path *following* the first '/', if it is present, and that seems to 
deal neatly with the awkward corner cases.

>Since the initial slash is so unique, maybe it makes sense to handle it
>separately: remove the initial slash (if there is one), then remove the
>dot segments, the restore the initial slash (if there was one).

(See above!  Guess who doesn't read ahead ;-)

>This
>would be easy to understand and remember (so people wouldn't have to
>always go back and look at the pseudocode), and would obviously preserve
>the separator, and would free the guts of the algorithm from having
>to worry about both kinds of paths (those that begin with a slash and
>those that begin with a segment), it could be designed for just one kind
>(paths that begin with a segment).
>
>Consider this for remove_dot_segments:
>
>  1) initialize the input buffer
>  2) initialize the output buffer to the empty string
>  3) if the input buffer begins with a slash then delete it and set a
>     flag F, otherwise clear F
>  4) while the input buffer is not empty, loop:
>      a) delete the beginning of the input buffer up to and including the
>         first slash (or to the end if there is no slash), and remember
>         the deleted text in a variable S
>      b) if S is neither "." nor "./" nor ".." nor "../" then append S to
>         the end of the output buffer
>      c) if S is ".." or "../" then delete everything after the
>         second-last last slash in the output buffer (or everything if
>         there are fewer than two slashes)
>  5) if F is set then insert a slash at the start of the output buffer
>  6) return the output buffer

Here's mine (in Haskell):
[[
--  Remove dot segments, but protect leading '/' character
removeDotSegments :: String -> String
removeDotSegments ('/':ps) = '/':elimDots ps []
removeDotSegments ps       = elimDots ps []

--  Second arg accumulates segments processed so far in reverse order
elimDots :: String -> [String] -> String
elimDots [] [] = ""
elimDots [] rs = concat (reverse rs)
elimDots (    '.':'/':ps)        rs  = elimDots ps rs
elimDots (    '.':[]    )        rs  = elimDots [] rs
elimDots (    '.':'.':'/':ps) (r:rs) | notSpecial r = elimDots ps rs
elimDots (    '.':'.':[]    ) (r:rs) | notSpecial r = elimDots [] rs
elimDots ps rs = elimDots ps1 (r:rs)
     where
         (r,ps1) = nextSegment ps

--  Returns the next segment and the rest of the path from a path string.
--  Each segment ends with the next '/' or the end of string.
--
nextSegment :: String -> (String,String)
nextSegment ps =
     case break (=='/') ps of
         (r,'/':ps1) -> (r++"/",ps1)
         (r,_)       -> (r,[])

--  Test that segment can be cancelled by following '..'
notSpecial :: String -> Bool
notSpecial ".."  = False
notSpecial "../" = False
notSpecial _     = True
]]


>I think this agrees with the current draft (and RFC-2396) for inputs
>that begin with a slash (someone please check that), and behaves more
>like we expect for inputs that don't begin with a slash.  It drops
>leading ".." segments, but doesn't insert an initial slash if the
>input didn't have one, so that /foo/../../../bar becomes /bar (same as
>/foo/../bar), but foo/../../../bar becomes bar (same as foo/../bar).
>
>By the way, there is an ambiguity in the draft:
>
>    1. If the input buffer begins with a prefix of "/./" or "/.", where
>       "." is a complete path segment, then replace that prefix with "/"
>
>Suppose the input buffer is "/./foo".  It begins with a prefix of "/./"
>and it begins with a prefix of "/.", so which prefix do I replace with
>"/"? The intention was presumably longest-match-wins, but that is not
>stated.
>
>AMC
>http://www.nicemice.net/amc/

------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Friday, 20 February 2004 04:07:21 UTC