ACTION-462: URI Fragments and HTTP redirects

URI Fragments and HTTP redirects

From RFC3986, we have this definition of a fragment:

<<
    The fragment identifier component of a URI allows indirect
    identification of a secondary resource by reference to a primary
    resource and additional identifying information.  The identified
    secondary resource may be some portion or subset of the primary
    resource, some view on representations of the primary resource, or
    some other resource defined or described by those representations.
>>

The important part is the taxonomy of fragments:
1/ portion or subsets of the primary resource
2/ view on representations of the primary resource
3/ some other resource defined or described by those representations.

In most cases, we are in case one, identifying a subset of the primary
resource, this is also the canonical use case for fragment in redirects:

GET http://www.example.com/book_ab/chapter_1.html
=> 301 http://www.example.com/bookshelf/ab#chapter_1

Also, again from RFC3986, the processing of the fragment depends on the
media type of the retrieved resource (if any)

<<
    The semantics of a fragment identifier are defined by the set of
    representations that might result from a retrieval action on the
    primary resource.  The fragment's format and resolution is therefore
    dependent on the media type [RFC2046] of a potentially retrieved
    representation, even though such a retrieval is only performed if the
    URI is dereferenced.  If no such representation exists, then the
    semantics of the fragment are considered unknown and are effectively
    unconstrained.  Fragment identifier semantics are independent of the
    URI scheme and thus cannot be redefined by scheme specifications.
>>

This leads, for case 1/ above to two different characteristics based on 
the media type:
a/ absolute fragment.
   The main example is HTML with named anchors or ids and a fragment being
   resolved as being the subpart starting at the named anchor.
   ex: http://www.example.com/spec.html#chapter_1
b/ relative fragment.
   The main example is with XPointer (so any XML document apart from a
   well-known exception)

http://www.example.com/something.xhtml#xpointer(descendant::div[position()=4])
This is a relative pointer that resolves from the root of the document.

In the book example above, we try to apply that relative fragment:
http://www.example.com/book_ab/chapter_1.html#xpointer(descendant::div[position()=4])

GET http://www.example.com/book_ab/chapter_1.html
=> 301 http://www.example.com/bookshelf/ab#chapter_1

It seems logical to try to apply the relative XPointer expression from the 
named anchor identified by ab#chapter_1.


* Type 2 fragments:

SVG defines also an 'svgView' fragment that shares the same syntax as 
XPointer, and thus avoids conflicts.

This fragment scheme defines the desired view of the SVG document, 
example:
http://www.example.com/MyDrawing.svg#svgView(viewBox(0,200,1000,1000))
This is a clear example of the 2/ class of fragment.

Also in SVG, the bare name form of fragment #myview can be linked to a 
view inside the SVG Document. Now, if we want to combine views like

http://www.example.com/MyDrawing.svg#svgView(viewBox(0,200,1000,1000))
GET http://www.example.com/MyDrawing.svg
=> 301 http://www.example.com/bigdrawing.svg#MyDrawing

What is the story of combining the two views, considering
#svgView(viewBox(0,200,1000,1000)) as relative to #MyDrawing?
Combining fragments is no longer "combining absolute and relative 
fragments" but it also depends on the media type.


* Type 3 fragments:

Enter the type 3/ of fragments. RDF fragments are an example of those
fragments, where fragments are not always identifying a part of the 
document.
(See http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-fragID )
Note also the following sentence from the same document:
<<
  This provides a handling of URI references and their denotation that is
  consistent with the RDF model theory and usage, and also with conventional
  Web behavior. Note that nothing here requires that an RDF application be
  able to retrieve any representation of resources identified by the URIs
  in an RDF graph.
>>
Meaning that an RDF application might process fragments even without
dereferencing the URI, and thus without the need to get the media type
to be able to process the fragment.

There is also another kind of type 3/ fragment, which can be found in
HTML or SVG document, but also in every kind of "active" document, where
the fragment acts as a stored state of a script.

http://www.example.com/slideset#slide(3) might give indication to a script
to display the slide #3 of a slideset (in that case it will act like a
type 2/ fragment).

http://www.example.com/somedoc.html#f3edf34 might not be a named anchor 
but
also a script state.

This leads to the following issues:
* Is the media type enough to describe the fragment semantics?
* Is the media type a good-enough approximation of the type in the case of
   a compound type using active content?

In any case, it also means that the combination of fragments becomes close
to impossible as there is no way to figure out the real intent of the 
original
fragment in the redirected content.

In the original example:

http://www.example.com/book_ab/chapter_1.html#myfragment

GET http://www.example.com/book_ab/chapter_1.html
=> 301 http://www.example.com/bookshelf/ab#chapter_1

What does #myfragment really meant in
http://www.example.com/book_ab/chapter_1.html and can this be translated in
http://www.example.com/bookshelf/ab#chapter_1 ?

The response is that in the general case, there is no answer.

----------

My position on this is that:
* Fragments in redirects have a real value and are already used.
* Fragment recombination can be hard and impossible in the general case
* We need to define a good story for applying a fragment to a redirected URI
   with a different fragment.

Tracker, this is ACTION-462

-- 
Baroula que barouleras, au tiéu toujou t'entourneras.

         ~~Yves

Received on Monday, 4 October 2010 17:10:44 UTC