We'll so as you said on our last call RFC 3986 is hard and so is this ...
... believe me it was a pain to stick as close as possible to RFC 3986 ...
I had to dig out the actual code I wrote for this to reconstruct some
pseudo code showing what it really does, so I think the approach to
stick as close as possible to RFC 3986 has failed.
I think by changing the whole algorithm we may have more success. The
main problem lies from my point of view in how step E. iterates over the
input buffer.
I think the easiest would be to take a completely different approach!
Maybe something like:
--------------------------------------------------------------------
1. Set a "path-absolute" flag if the input starts with a slash.
2. Take an empty stack and an empty buffer.
3. while(not at the end of the input) {
clear the buffer;
continue to scan the input from the left and append all non slash
characters to a temporary buffer (buf) until a slash is reached that was
preceded by a non slash character.
if (buf is '.'){
//ignore
} else if (buf is '..') {
if (stack is empty and not path-absolute) {
//start to accumulate complete '..' path segments to the left
push '..' onto the stack;
}else if (stack is empty and path-absolute) {
//ignore '..' path segments hitting the root;
}else if (stack-peek is '..') {
//stack is not empty is implied and continue to accumulate
//complete '..' path segments to the left
push '..' onto the stack;
} else {
//stack is not empty is implied so lets pop the the path
//segment to the left
pop the stack;
}
}
}
4. Take the stack now as a slash sperated list with the peek as the last
element and prepend a slash if "path-absolute".
5. If the last character of the input was a slash as well append a slash.
6. If the last path segment is '..' and not already terminated by a
slash append a slash as well.
Enjoy, can someone else also try to get his head around this .... I'll
test it in my implementation as soon as I have time.
--------------------------------------------------------------------
Anyhow please see below what we can achieve with RFC 3986 ...
Note: the // comments shall indicate the text from
http://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2007Jun/att-0000/Apendix.html
and they are interleaved with what it actually does.
I further used used the asterisk to *emphasize* some things.
Grosso, Paul schrieb:
> When you say:
>
>
>> If the input buffer starts with a root slash "/" ...
>>
>
> what is a "root slash"? Do you just mean a slash?
>
Well this should emphasize the point, that it is a
path-absolute = "/" [ segment-nz *( "/" segment ) ]
rather than a
path-rootless = segment-nz *( "/" segment )
I hope this makes sense to you.
> And continuing:
>
>
>> ...the output buffer is initialized with this root slash "/"
>>
>
> Does this mean the slash is removed from the input
> buffer now or not?
>
Yes, *this* shall emphasize that it is moved to the output buffer.
> When you say:
>
>
>> if also the output does not contain the root slash "/" only
>>
>
> does this just mean "if it is not the case that the
> output buffer consists of just a single '/' character"?
>
if (input starts with '../') {
// then
// if also the output does not contain the root slash "/" only
input delete prefix '../';
if(output == '/') {
// move this prefix to the end of the output buffer
output append '../';
}
>
> I don't know how to parse (in 2A):
>
>
>> if...then...else if...then if...then...else...;otherwise.
>>
if (input starts with './') {
// then remove that prefix from the input buffer
input delete prefix './';
} // , +else
else
// if the input buffer begins with a prefix of "../",
if (input starts with '../') {
// then
// if also the output does not contain the root slash "/" only
input delete prefix '../';
if(output equals '/') {
// move this prefix to the end of the output buffer
output append '../';
}
// else remove
// that prefix
} // otherwise,
else
B.
>
> In 2C, I don't know how to interpret:
>
>
>> if also the output buffer is empty, last segment in the output buffer
>>
> equals "../" or "..", where ".." is a complete path segment
>
> I don't know what's being or-ed, and I'm not sure
> what the if test really is.
>
// C. if the input buffer begins with a prefix of "/../" or "/..",
// where ".." is a complete path segment,
if (input starts with '/../' or '/..') {
// then replace that
// prefix with "/" in the input buffer
input replace prefix "/../" or "/.." with "/";
// and if also the output
// buffer is empty, last *segment* in the output buffer equals
// "../" or "..", where ".." is a complete path segment, then
// append ".." or "/.." for the latter case respectively to the
// output buffer
if ( output is empty || *last segment* in the output buffer
equals "../" || *last segment* in the output buffer equals "..") {
// then append *that prefix* to the output buffer,
if (last *segment* in the output buffer equals "../" ){
output append '..' ;
else if (last *segment* in the output buffer equals ".." ){
} else {
// else remove the last segment including it's
// preceding "/" (if any) from the output buffer
output delete *last segment*;
// and if hereby
// the first character in the output buffer was removed and it
// was not the root slash then delete a leading slash from the
// input buffer.
if (output had root slash && input starts with '/'){
input delete first character;
}
}
}
} // otherwise,
else
> In:
>
>
>> append ".." or "/.." for the latter case respectively
>>
>
> I don't know what that means. What is the latter case,
> what respectively to what, and just what am I appending when?
>
See above ....
> In 3, where you say:
>
>
>> if the only or last segment of the output buffer is "..", where ".."
>>
> is a complete path segment
>
> I know 3986 uses the term "complete path segment" (in fact,
> in 5.2.4, it refers to 'the special "." and ".." complete path
> segments'), but I'm still finding this wording complicated.
> Do you just mean:
>
> if the last (or only) segment of the output buffer is the ".."
> complete path segment
>
I think complete the term complete path segment is intended to exclude
segments like "..yes", "...", "a..b..c.." and so on .....
Konrad
--
Konrad Lanz, IAIK/SIC - Graz University of Technology
Inffeldgasse 16a, 8010 Graz, Austria
Tel: +43 316 873 5547
Fax: +43 316 873 5520
https://www.iaik.tugraz.at/aboutus/people/lanz
http://jce.iaik.tugraz.at
Certificate chain (including the EuroPKI root certificate):
https://europki.iaik.at/ca/europki-at/cert_download.htm