- From: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at>
- Date: Thu, 25 Oct 2007 23:07:34 +0200
- To: "Grosso, Paul" <pgrosso@ptc.com>
- CC: public-xml-core-wg@w3.org
- Message-ID: <47210596.50408@iaik.tugraz.at>
We'll so as you said on our last call RFC 3986 is hard and so is this ... ... believe me it was a pain to stick as close as possible to RFC 3986 ... I had to dig out the actual code I wrote for this to reconstruct some pseudo code showing what it really does, so I think the approach to stick as close as possible to RFC 3986 has failed. I think by changing the whole algorithm we may have more success. The main problem lies from my point of view in how step E. iterates over the input buffer. I think the easiest would be to take a completely different approach! Maybe something like: -------------------------------------------------------------------- 1. Set a "path-absolute" flag if the input starts with a slash. 2. Take an empty stack and an empty buffer. 3. while(not at the end of the input) { clear the buffer; continue to scan the input from the left and append all non slash characters to a temporary buffer (buf) until a slash is reached that was preceded by a non slash character. if (buf is '.'){ //ignore } else if (buf is '..') { if (stack is empty and not path-absolute) { //start to accumulate complete '..' path segments to the left push '..' onto the stack; }else if (stack is empty and path-absolute) { //ignore '..' path segments hitting the root; }else if (stack-peek is '..') { //stack is not empty is implied and continue to accumulate //complete '..' path segments to the left push '..' onto the stack; } else { //stack is not empty is implied so lets pop the the path //segment to the left pop the stack; } } } 4. Take the stack now as a slash sperated list with the peek as the last element and prepend a slash if "path-absolute". 5. If the last character of the input was a slash as well append a slash. 6. If the last path segment is '..' and not already terminated by a slash append a slash as well. Enjoy, can someone else also try to get his head around this .... I'll test it in my implementation as soon as I have time. -------------------------------------------------------------------- Anyhow please see below what we can achieve with RFC 3986 ... Note: the // comments shall indicate the text from http://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2007Jun/att-0000/Apendix.html and they are interleaved with what it actually does. I further used used the asterisk to *emphasize* some things. Grosso, Paul schrieb: > When you say: > > >> If the input buffer starts with a root slash "/" ... >> > > what is a "root slash"? Do you just mean a slash? > Well this should emphasize the point, that it is a path-absolute = "/" [ segment-nz *( "/" segment ) ] rather than a path-rootless = segment-nz *( "/" segment ) I hope this makes sense to you. > And continuing: > > >> ...the output buffer is initialized with this root slash "/" >> > > Does this mean the slash is removed from the input > buffer now or not? > Yes, *this* shall emphasize that it is moved to the output buffer. > When you say: > > >> if also the output does not contain the root slash "/" only >> > > does this just mean "if it is not the case that the > output buffer consists of just a single '/' character"? > if (input starts with '../') { // then // if also the output does not contain the root slash "/" only input delete prefix '../'; if(output == '/') { // move this prefix to the end of the output buffer output append '../'; } > > I don't know how to parse (in 2A): > > >> if...then...else if...then if...then...else...;otherwise. >> if (input starts with './') { // then remove that prefix from the input buffer input delete prefix './'; } // , +else else // if the input buffer begins with a prefix of "../", if (input starts with '../') { // then // if also the output does not contain the root slash "/" only input delete prefix '../'; if(output equals '/') { // move this prefix to the end of the output buffer output append '../'; } // else remove // that prefix } // otherwise, else B. > > In 2C, I don't know how to interpret: > > >> if also the output buffer is empty, last segment in the output buffer >> > equals "../" or "..", where ".." is a complete path segment > > I don't know what's being or-ed, and I'm not sure > what the if test really is. > // C. if the input buffer begins with a prefix of "/../" or "/..", // where ".." is a complete path segment, if (input starts with '/../' or '/..') { // then replace that // prefix with "/" in the input buffer input replace prefix "/../" or "/.." with "/"; // and if also the output // buffer is empty, last *segment* in the output buffer equals // "../" or "..", where ".." is a complete path segment, then // append ".." or "/.." for the latter case respectively to the // output buffer if ( output is empty || *last segment* in the output buffer equals "../" || *last segment* in the output buffer equals "..") { // then append *that prefix* to the output buffer, if (last *segment* in the output buffer equals "../" ){ output append '..' ; else if (last *segment* in the output buffer equals ".." ){ } else { // else remove the last segment including it's // preceding "/" (if any) from the output buffer output delete *last segment*; // and if hereby // the first character in the output buffer was removed and it // was not the root slash then delete a leading slash from the // input buffer. if (output had root slash && input starts with '/'){ input delete first character; } } } } // otherwise, else > In: > > >> append ".." or "/.." for the latter case respectively >> > > I don't know what that means. What is the latter case, > what respectively to what, and just what am I appending when? > See above .... > In 3, where you say: > > >> if the only or last segment of the output buffer is "..", where ".." >> > is a complete path segment > > I know 3986 uses the term "complete path segment" (in fact, > in 5.2.4, it refers to 'the special "." and ".." complete path > segments'), but I'm still finding this wording complicated. > Do you just mean: > > if the last (or only) segment of the output buffer is the ".." > complete path segment > I think complete the term complete path segment is intended to exclude segments like "..yes", "...", "a..b..c.." and so on ..... Konrad -- Konrad Lanz, IAIK/SIC - Graz University of Technology Inffeldgasse 16a, 8010 Graz, Austria Tel: +43 316 873 5547 Fax: +43 316 873 5520 https://www.iaik.tugraz.at/aboutus/people/lanz http://jce.iaik.tugraz.at Certificate chain (including the EuroPKI root certificate): https://europki.iaik.at/ca/europki-at/cert_download.htm
Received on Thursday, 25 October 2007 21:08:21 UTC