RDF's relative IRI resolution is ambiguous

Dear all,

The RDF 1.1 spec states that
    Relative IRIs must be resolved against a base IRI to make them absolute [1].
but also that
    Further normalization must not be performed when comparing IRIs for equality [1].
However, the URI resolution algorithm linked to in RFC3986
seems to perform normalization at the same time [2].
Therefore, it seems that RDF's IRI resolution is not well-defined in all cases,
and can even lead to errors in other cases, as I will discuss in this mail.

In RDF, we are not allowed to perform IRI normalization.
E.g., the following IRIs are all different and should be treated as such:
    <http://example.org/foo/bar>
    <http://example.org/foo/./bar>
    <http://example.org/foo/baz/../bar>

Yet the Turtle spec, for instance, writes that
    Relative IRIs are resolved with base IRIs as per RFC3986
    using only the basic algorithm in section 5.2 [3].
Unfortunately, RFC3986 makes no mention of a "basic algorithm",
but it seems the intention is that for example
    BASE <http://example.org/foo/bar>
    <a> <./b> <../c>.
is interpreted as
    <http://example.org/foo/a> <http://example.org/foo/b> <http://example.org/c>.

But if we look at the algorithm in RFC3986,
we see that it actually first concatenates the base and relative IRI:
    The input buffer is initialized with the now-appended path components
    and the output buffer is initialized to the empty string [4].
Only after this concatenation, it resolves occurrences of "./" and "../".
Hence, it actually concatenates and normalizes to perform resolution,
while normalization is actually not allowed for RDF resource identifiers.

Concretely, this means that resolution behavior is undefined
for those cases where "./" or "../" occurs on either side.
To demonstrate this behavior, I have created a test suite [5].
The full results are available online [6].

Below are some of the oddities I found:

cwm considers superfluous "./" normalization, EYE does not:
   BASE http://example.org/xxx/yyy/zzz, resolving ./././aaa/bbb/ccc
     EYE     http://example.org/xxx/yyy/aaa/bbb/ccc
     cwm     http://example.org/xxx/yyy/././aaa/bbb/ccc

   BASE http://example.org/xxx/yyy/zzz, resolving ../././../././../aaa/bbb/ccc
     EYE     http://example.org/aaa/bbb/ccc
     cwm     http://example.org/xxx/./../././../aaa/bbb/ccc

It gets even worse for cases where the base IRI ends with "./" or "../":

 BASE http://example.org/xxx/yyy/zzz/./././, resolving ../aaa/bbb/ccc
   EYE/cwm     http://example.org/xxx/yyy/zzz/././aaa/bbb/ccc

 BASE http://example.org/xxx/yyy/zzz/../../../, resolving ../aaa/bbb/ccc
   EYE/cwm     http://example.org/xxx/yyy/zzz/../../aaa/bbb/ccc

Note how the trailing "../" of the relative IRI
actually splits of "./" or "../" of the base IRI,
while this does not have the intended effect at all!

So, how do we proceed from here?
What is the correct algorithm to perform IRI resolution in presence of "./" and "../"?
Or should we ignore their presence and just concatenate?
And where do we draw the line between resolution and normalization?

Best,

Ruben

[1] http://www.w3.org/TR/rdf11-concepts/#section-IRIs
[2] http://tools.ietf.org/html/rfc3986#section-5.2
[3] http://www.w3.org/TR/turtle/#sec-iri-references
[4] http://tools.ietf.org/html/rfc3986#section-5.2.4
[5] https://github.com/RubenVerborgh/TurtleIriResolution
[6] https://gist.github.com/RubenVerborgh/eb3717bb78df42369b0f

Received on Thursday, 27 August 2015 11:57:31 UTC