Re: #295: Applying original fragment to "plain" redirected URI (also #43)

Hello Julian, others,

On 2012/01/03 23:43, Julian Reschke wrote:
> On 2011-12-30 18:51, Julian Reschke wrote:
>> ...
>> Indeed; see my tests at
>> <http://greenbytes.de/tech/tc/httpredirects/#l-fragments> (note that
>> Safari appears to have funny issues filling the iframes; but navigating
>> to the linked resource gets you proper results).
>> ...
>
> I just realized that the rule we would need to describe *almost* is the
> one define in the URI spec
> (<http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.5.2>) as
> "relative resolution":

> "Almost", because it doesn't use Base.fragment when R.frament is undefined.
>
> a) Should we try describe the algorithm based on RFC 3986 ("do relative
> resolution as defined by ..., then, if the result doesn't have a
> fragment, add the one from the Base URI")?

I'm not at all sure that this description is correct. It would mean that 
I can have something like:
Request URI:   http://1.example.org/path1/file1.ext
Redirect URI:  http://2.example.org#frag2

and the result would be:
http://2.example.org/path1/file1.ext#frag2

As you can see in the result, there is a mixture of components from the 
request URI (1) and the redirect URI (2). The way that relative 
resolution works otherwise is that in the result, all components from 
(2) precede components from (1).

Below is the change to the algorithm that I'd think is correct. In 
logical terms, it's straightforward: Use the fragment from the "base" 
only if nothing before the fragment is coming from the "resource". 
However, in terms of actual code, there are quite a few places to 
change. This is because the if/else hierarchy gets deeper and deeper for 
the later parts of the URI. In the algorithm, scheme is set in two 
locations, authority in three, and so on. The structure of the code gets 
even more regular if you change
    if (R.path == "") then
to
    if (R.path != "") then
(which is equivalent to "if defined(R.path) then") and exchange the 
respective code blocks. The only irregularity in the structure then is the
    if (R.path starts-with "/") then
condition; this could be regularized by separating path (without the 
actual final name of the resource) and pure resource (file) name.

 >>>>
    -- The URI reference is parsed into the five URI components
    --
    (R.scheme, R.authority, R.path, R.query, R.fragment) = parse(R);

    -- A non-strict parser may ignore a scheme in the reference
    -- if it is identical to the base URI's scheme.
    --
    if ((not strict) and (R.scheme == Base.scheme)) then
       undefine(R.scheme);
    endif;

    if defined(R.scheme) then
       T.scheme    = R.scheme;
       T.authority = R.authority;
       T.path      = remove_dot_segments(R.path);
       T.query     = R.query;
       T.fragment  = R.fragment;               -- this line added
    else
       if defined(R.authority) then
          T.authority = R.authority;
          T.path      = remove_dot_segments(R.path);
          T.query     = R.query;
          T.fragment  = R.fragment;            -- this line added
       else
          if (R.path == "") then
             T.path = Base.path;
             if defined(R.query) then
                T.query = R.query;
                T.fragment = R.fragment;       -- this line added
             else
                T.query = Base.query;
                if defined(R.fragment) then    -- this line added
                   T.fragment  = R.fragment;   -- this line added
                else                           -- this line added
                   T.fragment = Base.fragment; -- this line added
                endif;                         -- this line added
             endif;
          else
             if (R.path starts-with "/") then
                T.path = remove_dot_segments(R.path);
             else
                T.path = merge(Base.path, R.path);
                T.path = remove_dot_segments(T.path);
             endif;
             T.query = R.query;
             T.fragment = R.fragment;          -- this line added
          endif;
          T.authority = Base.authority;
       endif;
       T.scheme = Base.scheme;
    endif;

    -- T.fragment = R.fragment;                -- this line commented out
 >>>>

It's also possible to rewrite this as:

 >>>>
    -- The URI reference is parsed into the five URI components
    --
    (R.scheme, R.authority, R.path, R.query, R.fragment) = parse(R);
    T.fragment = undefined;                    -- this line added

    -- A non-strict parser may ignore a scheme in the reference
    -- if it is identical to the base URI's scheme.
    --
    if ((not strict) and (R.scheme == Base.scheme)) then
       undefine(R.scheme);
    endif;

    if defined(R.scheme) then
       T.scheme    = R.scheme;
       T.authority = R.authority;
       T.path      = remove_dot_segments(R.path);
       T.query     = R.query;
    else
       if defined(R.authority) then
          T.authority = R.authority;
          T.path      = remove_dot_segments(R.path);
          T.query     = R.query;
       else
          if (R.path == "") then
             T.path = Base.path;
             if defined(R.query) then
                T.query = R.query;
             else
                T.query = Base.query;
                if not defined(R.fragment) then  -- this line added
                   T.fragment = Base.fragment;   -- this line added
                endif;                           -- this line added
             endif;
          else
             if (R.path starts-with "/") then
                T.path = remove_dot_segments(R.path);
             else
                T.path = merge(Base.path, R.path);
                T.path = remove_dot_segments(T.path);
             endif;
             T.query = R.query;
          endif;
          T.authority = Base.authority;
       endif;
       T.scheme = Base.scheme;
    endif;

    if not defined(T.fragment) then            -- this line added
       T.fragment = R.fragment;
    endif;                                     -- this line added
 >>>>

This localizes the changes better and can probably serve as the base (no 
pun intended) for spec text.


> b) Is this potentially an erratum for RFC 3986?

I would say NO. My understanding is that something like
    <a href="">a link</a>
always refers to the resource itself, not a subresource. If the erratum 
went through, there would be no short way to refer to a resource itself.

Regards,    Martin.

Received on Wednesday, 4 January 2012 10:28:30 UTC