[Fwd: Re: C14N 1.1 Appendix A [was: Minutes for XML Core WG telcon of 2007 October 24]] from Konrad Lanz on 2007-10-30 (public-xmlsec-maintwg@w3.org from October 2007)

Forwarded message 1

From: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at>
Date: Thu, 25 Oct 2007 23:07:34 +0200
Subject: Re: C14N 1.1 Appendix A [was: Minutes for XML Core WG telcon of 2007 October 24]
To: "Grosso, Paul" <pgrosso@ptc.com>
CC: public-xml-core-wg@w3.org
Message-ID: <47210596.50408@iaik.tugraz.at>
We'll so as you said on our last call RFC 3986 is hard and so is this ...

... believe me it was a pain to stick as close as possible to RFC 3986 ...

I had to dig out the actual code I wrote for this to reconstruct some
pseudo code showing what it really does, so I think the approach to
stick as close as possible to RFC 3986 has failed.

I think by changing the whole algorithm we may have more success. The
main problem lies from my point of view in how step E. iterates over the
input buffer.

I think the easiest would be to take a completely different approach!
Maybe something like:
--------------------------------------------------------------------

1. Set a "path-absolute" flag if the input starts with a slash.

2. Take an empty stack and an empty buffer.

3.  while(not at the end of the input) {
 
  clear the buffer;

  continue to scan the input from the left and append all non slash
  characters to a temporary buffer (buf) until a slash is reached that was
  preceded by a non slash character.

  if (buf is '.'){
   //ignore
  } else if (buf is '..') {
        if (stack is empty and not path-absolute) {
          //start to accumulate complete '..' path segments to the left
          push '..' onto the stack;
        }else if (stack is empty and path-absolute) {
          //ignore '..' path segments hitting the root;
        }else if (stack-peek is '..') {
           //stack is not empty is implied and continue to accumulate
           //complete '..' path segments to the left
          push '..' onto the stack;
        } else {
           //stack is not empty is implied so lets pop the the path
           //segment to the left
          pop the stack;
        }
  }
}

4. Take the stack now as a slash sperated list with the peek as the last
    element and prepend a slash if "path-absolute".

5. If the last character of the input was a  slash as well append a slash.

6. If the last path segment is  '..' and not already terminated by a
slash append a slash as well.

Enjoy, can someone else also try to get his head around this .... I'll
test it in my implementation as soon as I have time.

--------------------------------------------------------------------

Anyhow please see below what we can achieve with RFC 3986 ...

Note: the // comments shall indicate the text from
http://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2007Jun/att-0000/Apendix.html

and they are interleaved with what it actually does.
I further used used the asterisk to *emphasize* some things.

Grosso, Paul schrieb:
> When you say:
>  
>   
>> If the input buffer starts with a root slash "/" ...
>>     
>  
> what is a "root slash"?  Do you just mean a slash?
>   

Well this should emphasize the point, that it is a

      path-absolute = "/" [ segment-nz *( "/" segment ) ]

rather than a

      path-rootless = segment-nz *( "/" segment )


I hope this makes sense to you.

> And continuing:
>  
>   
>> ...the output buffer is initialized with this root slash "/"
>>     
>  
> Does this mean the slash is removed from the input 
> buffer now or not?
>   

Yes, *this* shall emphasize that it is moved to the output buffer.

> When you say:
>  
>   
>> if also the output does not contain the root slash "/" only
>>     
>  
> does this just mean "if it is not the case that the 
> output buffer consists of just a single '/' character"?
>   

      if (input starts with '../') {
//        then
//        if also the output does not contain the root slash "/" only
        input delete prefix '../';
        if(output == '/') {
//        move this prefix to the end of the output buffer
          output append '../';
        }

>  
> I don't know how to parse (in 2A):
>  
>   
>> if...then...else if...then if...then...else...;otherwise.
>>     
      if (input starts with './') {
        // then remove that prefix from the input buffer
        input delete prefix './';
      } // , +else
      else
      // if the input buffer begins with a prefix of "../",
      if (input starts with '../') {
//        then
//        if also the output does not contain the root slash "/" only
        input delete prefix '../';
        if(output equals '/') {
//        move this prefix to the end of the output buffer
          output append '../';
        }
//      else remove
//      that prefix
       
      } // otherwise,
      else

B.

>  
> In 2C, I don't know how to interpret:
>  
>   
>> if also the output buffer is empty, last segment in the output buffer
>>     
> equals "../" or "..", where ".." is a complete path segment
>  
> I don't know what's being or-ed, and I'm not sure 
> what the if test really is.
>   
      // C. if the input buffer begins with a prefix of "/../" or "/..",
      // where ".." is a complete path segment,
      if (input starts with '/../' or '/..') {

//        then replace that
//        prefix with "/" in the input buffer

        input replace prefix "/../" or "/.." with "/";

//        and  if also the output
//        buffer is empty, last *segment* in the output buffer equals
//        "../" or "..", where ".." is a complete path segment, then
//        append ".." or "/.." for the latter case respectively to the
//        output buffer
        if ( output is empty || *last segment* in the output buffer
equals "../" || *last segment* in the output buffer equals "..") {
          // then append *that prefix* to the output buffer,
          if (last *segment* in the output buffer equals "../" ){
            output append '..' ;
          else if (last *segment* in the output buffer equals ".." ){
          } else {
//          else remove the last segment including it's
//          preceding "/" (if any) from the output buffer
            output  delete *last segment*;
           
//          and if hereby
//          the first character in the output buffer was removed and it
//          was not the root slash then delete a leading slash from the
//          input buffer.
           
            if (output had root slash && input starts with '/'){
              input delete first character;
            }
           
          }
        }       
      } // otherwise,
      else

> In:
>  
>   
>> append ".." or "/.." for the latter case respectively
>>     
>  
> I don't know what that means.  What is the latter case, 
> what respectively to what, and just what am I appending when?
>   

See above ....

> In 3, where you say:
>  
>   
>> if the only or last segment of the output buffer is "..", where ".."
>>     
> is a complete path segment
>  
> I know 3986 uses the term "complete path segment" (in fact, 
> in 5.2.4, it refers to 'the special "." and ".." complete path 
> segments'), but I'm still finding this wording complicated.  
> Do you just mean:
>  
>     if the last (or only) segment of the output buffer is the ".."
> complete path segment
>   

I think complete the term complete path segment is intended to exclude
segments like  "..yes", "...", "a..b..c.." and so on .....

Konrad

-- 
Konrad Lanz, IAIK/SIC - Graz University of Technology
Inffeldgasse 16a, 8010 Graz, Austria
Tel: +43 316 873 5547
Fax: +43 316 873 5520
https://www.iaik.tugraz.at/aboutus/people/lanz
http://jce.iaik.tugraz.at

Certificate chain (including the EuroPKI root certificate):
https://europki.iaik.at/ca/europki-at/cert_download.htm