- From: Ruben Verborgh <ruben.verborgh@ugent.be>
- Date: Thu, 27 Aug 2015 13:57:00 +0200
- To: RDF Comments <public-rdf-comments@w3.org>
Dear all,
The RDF 1.1 spec states that
Relative IRIs must be resolved against a base IRI to make them absolute [1].
but also that
Further normalization must not be performed when comparing IRIs for equality [1].
However, the URI resolution algorithm linked to in RFC3986
seems to perform normalization at the same time [2].
Therefore, it seems that RDF's IRI resolution is not well-defined in all cases,
and can even lead to errors in other cases, as I will discuss in this mail.
In RDF, we are not allowed to perform IRI normalization.
E.g., the following IRIs are all different and should be treated as such:
<http://example.org/foo/bar>
<http://example.org/foo/./bar>
<http://example.org/foo/baz/../bar>
Yet the Turtle spec, for instance, writes that
Relative IRIs are resolved with base IRIs as per RFC3986
using only the basic algorithm in section 5.2 [3].
Unfortunately, RFC3986 makes no mention of a "basic algorithm",
but it seems the intention is that for example
BASE <http://example.org/foo/bar>
<a> <./b> <../c>.
is interpreted as
<http://example.org/foo/a> <http://example.org/foo/b> <http://example.org/c>.
But if we look at the algorithm in RFC3986,
we see that it actually first concatenates the base and relative IRI:
The input buffer is initialized with the now-appended path components
and the output buffer is initialized to the empty string [4].
Only after this concatenation, it resolves occurrences of "./" and "../".
Hence, it actually concatenates and normalizes to perform resolution,
while normalization is actually not allowed for RDF resource identifiers.
Concretely, this means that resolution behavior is undefined
for those cases where "./" or "../" occurs on either side.
To demonstrate this behavior, I have created a test suite [5].
The full results are available online [6].
Below are some of the oddities I found:
cwm considers superfluous "./" normalization, EYE does not:
BASE http://example.org/xxx/yyy/zzz, resolving ./././aaa/bbb/ccc
EYE http://example.org/xxx/yyy/aaa/bbb/ccc
cwm http://example.org/xxx/yyy/././aaa/bbb/ccc
BASE http://example.org/xxx/yyy/zzz, resolving ../././../././../aaa/bbb/ccc
EYE http://example.org/aaa/bbb/ccc
cwm http://example.org/xxx/./../././../aaa/bbb/ccc
It gets even worse for cases where the base IRI ends with "./" or "../":
BASE http://example.org/xxx/yyy/zzz/./././, resolving ../aaa/bbb/ccc
EYE/cwm http://example.org/xxx/yyy/zzz/././aaa/bbb/ccc
BASE http://example.org/xxx/yyy/zzz/../../../, resolving ../aaa/bbb/ccc
EYE/cwm http://example.org/xxx/yyy/zzz/../../aaa/bbb/ccc
Note how the trailing "../" of the relative IRI
actually splits of "./" or "../" of the base IRI,
while this does not have the intended effect at all!
So, how do we proceed from here?
What is the correct algorithm to perform IRI resolution in presence of "./" and "../"?
Or should we ignore their presence and just concatenate?
And where do we draw the line between resolution and normalization?
Best,
Ruben
[1] http://www.w3.org/TR/rdf11-concepts/#section-IRIs
[2] http://tools.ietf.org/html/rfc3986#section-5.2
[3] http://www.w3.org/TR/turtle/#sec-iri-references
[4] http://tools.ietf.org/html/rfc3986#section-5.2.4
[5] https://github.com/RubenVerborgh/TurtleIriResolution
[6] https://gist.github.com/RubenVerborgh/eb3717bb78df42369b0f
Received on Thursday, 27 August 2015 11:57:31 UTC