Repost -------- Original-Nachricht -------- Betreff: Re: C14N 1.1 Appendix A [was: Minutes for XML Core WG telcon of 2007 October 24] Datum: Thu, 25 Oct 2007 23:17:47 +0200 Von: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at> An: Grosso, Paul <pgrosso@ptc.com> CC: public-xml-core-wg@w3.org Referenzen: <CF83BAA719FD2C439D25CBB1C9D1D302091FB4EF@HQ-MAIL4.ptcnet.ptc.com> <CF83BAA719FD2C439D25CBB1C9D1D3020928640E@HQ-MAIL4.ptcnet.ptc.com> <CF83BAA719FD2C439D25CBB1C9D1D302092FA668@HQ-MAIL4.ptcnet.ptc.com> <4720D0EF.5090703@iaik.tugraz.at> <CF83BAA719FD2C439D25CBB1C9D1D302092FA766@HQ-MAIL4.ptcnet.ptc.com> <47210596.50408@iaik.tugraz.at> Konrad Lanz wrote: > I think the easiest would be to take a completely different approach! > Maybe something like: > -------------------------------------------------------------------- > > 1. Set a "path-absolute" flag if the input starts with a slash. > > 2. Take an empty stack and an empty buffer. > > 3. while(not at the end of the input) { > > clear the buffer; > > continue to scan the input from the left and append all non slash > characters to a temporary buffer (buf) until a slash is reached that was > preceded by a non slash character. > > if (buf is '.'){ > //ignore > } else if (buf is '..') { > if (stack is empty and not path-absolute) { > //start to accumulate complete '..' path segments to the left > push '..' onto the stack; > }else if (stack is empty and path-absolute) { > //ignore '..' path segments hitting the root; > }else if (stack-peek is '..') { > //stack is not empty is implied and continue to accumulate > //complete '..' path segments to the left > push '..' onto the stack; > } else { > //stack is not empty is implied so lets pop the the path > //segment to the left > pop the stack; > } > } else { push buf's value onto the stack; > } > } > > 4. Take the stack now as a slash sperated list with the peek as the last > element and prepend a slash if "path-absolute". > > 5. If the last character of the input was a slash as well append a slash. > > 6. If the last path segment is '..' and not already terminated by a > slash append a slash as well. > > Enjoy, can someone else also try to get his head around this .... I'll > test it in my implementation as soon as I have time. > > -------------------------------------------------------------------- > Can't just get it right the first time ... well let's see what others think. -- Konrad Lanz, IAIK/SIC - Graz University of Technology Inffeldgasse 16a, 8010 Graz, Austria Tel: +43 316 873 5547 Fax: +43 316 873 5520 https://www.iaik.tugraz.at/aboutus/people/lanz http://jce.iaik.tugraz.at Certificate chain (including the EuroPKI root certificate): https://europki.iaik.at/ca/europki-at/cert_download.htm -- Konrad Lanz, IAIK/SIC - Graz University of Technology Inffeldgasse 16a, 8010 Graz, Austria Tel: +43 316 873 5547 Fax: +43 316 873 5520 https://www.iaik.tugraz.at/aboutus/people/lanz http://jce.iaik.tugraz.at Certificate chain (including the EuroPKI root certificate): https://europki.iaik.at/ca/europki-at/cert_download.htm
attached mail follows:
We'll so as you said on our last call RFC 3986 is hard and so is this ...
... believe me it was a pain to stick as close as possible to RFC 3986 ...
I had to dig out the actual code I wrote for this to reconstruct some
pseudo code showing what it really does, so I think the approach to
stick as close as possible to RFC 3986 has failed.
I think by changing the whole algorithm we may have more success. The
main problem lies from my point of view in how step E. iterates over the
input buffer.
I think the easiest would be to take a completely different approach!
Maybe something like:
--------------------------------------------------------------------
1. Set a "path-absolute" flag if the input starts with a slash.
2. Take an empty stack and an empty buffer.
3. while(not at the end of the input) {
clear the buffer;
continue to scan the input from the left and append all non slash
characters to a temporary buffer (buf) until a slash is reached that was
preceded by a non slash character.
if (buf is '.'){
//ignore
} else if (buf is '..') {
if (stack is empty and not path-absolute) {
//start to accumulate complete '..' path segments to the left
push '..' onto the stack;
}else if (stack is empty and path-absolute) {
//ignore '..' path segments hitting the root;
}else if (stack-peek is '..') {
//stack is not empty is implied and continue to accumulate
//complete '..' path segments to the left
push '..' onto the stack;
} else {
//stack is not empty is implied so lets pop the the path
//segment to the left
pop the stack;
}
}
}
4. Take the stack now as a slash sperated list with the peek as the last
element and prepend a slash if "path-absolute".
5. If the last character of the input was a slash as well append a slash.
6. If the last path segment is '..' and not already terminated by a
slash append a slash as well.
Enjoy, can someone else also try to get his head around this .... I'll
test it in my implementation as soon as I have time.
--------------------------------------------------------------------
Anyhow please see below what we can achieve with RFC 3986 ...
Note: the // comments shall indicate the text from
http://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2007Jun/att-0000/Apendix.html
and they are interleaved with what it actually does.
I further used used the asterisk to *emphasize* some things.
Grosso, Paul schrieb:
> When you say:
>
>
>> If the input buffer starts with a root slash "/" ...
>>
>
> what is a "root slash"? Do you just mean a slash?
>
Well this should emphasize the point, that it is a
path-absolute = "/" [ segment-nz *( "/" segment ) ]
rather than a
path-rootless = segment-nz *( "/" segment )
I hope this makes sense to you.
> And continuing:
>
>
>> ...the output buffer is initialized with this root slash "/"
>>
>
> Does this mean the slash is removed from the input
> buffer now or not?
>
Yes, *this* shall emphasize that it is moved to the output buffer.
> When you say:
>
>
>> if also the output does not contain the root slash "/" only
>>
>
> does this just mean "if it is not the case that the
> output buffer consists of just a single '/' character"?
>
if (input starts with '../') {
// then
// if also the output does not contain the root slash "/" only
input delete prefix '../';
if(output == '/') {
// move this prefix to the end of the output buffer
output append '../';
}
>
> I don't know how to parse (in 2A):
>
>
>> if...then...else if...then if...then...else...;otherwise.
>>
if (input starts with './') {
// then remove that prefix from the input buffer
input delete prefix './';
} // , +else
else
// if the input buffer begins with a prefix of "../",
if (input starts with '../') {
// then
// if also the output does not contain the root slash "/" only
input delete prefix '../';
if(output equals '/') {
// move this prefix to the end of the output buffer
output append '../';
}
// else remove
// that prefix
} // otherwise,
else
B.
>
> In 2C, I don't know how to interpret:
>
>
>> if also the output buffer is empty, last segment in the output buffer
>>
> equals "../" or "..", where ".." is a complete path segment
>
> I don't know what's being or-ed, and I'm not sure
> what the if test really is.
>
// C. if the input buffer begins with a prefix of "/../" or "/..",
// where ".." is a complete path segment,
if (input starts with '/../' or '/..') {
// then replace that
// prefix with "/" in the input buffer
input replace prefix "/../" or "/.." with "/";
// and if also the output
// buffer is empty, last *segment* in the output buffer equals
// "../" or "..", where ".." is a complete path segment, then
// append ".." or "/.." for the latter case respectively to the
// output buffer
if ( output is empty || *last segment* in the output buffer
equals "../" || *last segment* in the output buffer equals "..") {
// then append *that prefix* to the output buffer,
if (last *segment* in the output buffer equals "../" ){
output append '..' ;
else if (last *segment* in the output buffer equals ".." ){
} else {
// else remove the last segment including it's
// preceding "/" (if any) from the output buffer
output delete *last segment*;
// and if hereby
// the first character in the output buffer was removed and it
// was not the root slash then delete a leading slash from the
// input buffer.
if (output had root slash && input starts with '/'){
input delete first character;
}
}
}
} // otherwise,
else
> In:
>
>
>> append ".." or "/.." for the latter case respectively
>>
>
> I don't know what that means. What is the latter case,
> what respectively to what, and just what am I appending when?
>
See above ....
> In 3, where you say:
>
>
>> if the only or last segment of the output buffer is "..", where ".."
>>
> is a complete path segment
>
> I know 3986 uses the term "complete path segment" (in fact,
> in 5.2.4, it refers to 'the special "." and ".." complete path
> segments'), but I'm still finding this wording complicated.
> Do you just mean:
>
> if the last (or only) segment of the output buffer is the ".."
> complete path segment
>
I think complete the term complete path segment is intended to exclude
segments like "..yes", "...", "a..b..c.." and so on .....
Konrad
--
Konrad Lanz, IAIK/SIC - Graz University of Technology
Inffeldgasse 16a, 8010 Graz, Austria
Tel: +43 316 873 5547
Fax: +43 316 873 5520
https://www.iaik.tugraz.at/aboutus/people/lanz
http://jce.iaik.tugraz.at
Certificate chain (including the EuroPKI root certificate):
https://europki.iaik.at/ca/europki-at/cert_download.htm
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:22:02 GMT