remove_dot_segment in draft-fielding-uri-rfc2396bis-03.txt

Some comments on the remove_dot_segment prescribed in rfc2396bis:

|   The pseudocode also refers to a remove_dot_segments routine for
|   interpreting and removing the special "." and ".." complete path
|   segments from a referenced path.  This is done after the path is
|   extracted from a reference, whether or not the path was relative, in
|   order to remove any invalid or extraneous dot-segments prior to
|   forming the target URI.  Although there are many ways to accomplish
|   this removal process, we describe a simple method using a separate
|   string buffer:
|
|   1.  The buffer is initialized with the unprocessed path component.
|
|   2.  If the buffer begins with "./" or "../", the "." or ".." segment
|       is removed.

Drop the word "segment" here or add it to step 4 as well.

|   3.  All occurrences of "/./" in the buffer are replaced with "/".

I think it should borrow some phrasing from step 5, which makes it:

        All occurrences of "/./" in the buffer are iteratively replaced
        until no matching pattern remains.

Otherwise it is not clear how /././ is replaced.

|   4.  If the buffer ends with "/.", the "." is removed.
|
|   5.  All occurrences of "/<segment>/../" in the buffer, where ".." and
|       <segment> are complete path segments, are iteratively replaced
|       with "/" in order from left to right until no matching pattern
|       remains. If the buffer ends with "/<segment>/..", that is also
|       replaced with "/". Note that <segment> may be empty.
|
|   6.  All prefixes of "<segment>/../" in the buffer, where ".." and
|       <segment> are complete path segments, are iteratively replaced
|       with "/" in order from left to right until no matching pattern
|       remains. If the buffer ends with "<segment>/..", that is also
|       replaced with "/". Note that <segment> may be empty.

Can there actually be more than 1 prefix like this?  Once it is
replaced it can not match again as the buffer now starts with "/".

|   7.  The remaining buffer is returned as the result of
|       remove_dot_segments.

If the buffer starts out as "a/../../c" then this algorithm ends up
with "a/c" (step 5 kills the "/../..").  I don't think that is the
intention.  Shouldn't step 5 and 6 be swapped?

Regards,
Gisle Aas

Received on Thursday, 24 July 2003 03:03:49 UTC