- From: Ruben Verborgh <ruben.verborgh@ugent.be>
- Date: Thu, 27 Aug 2015 13:57:00 +0200
- To: RDF Comments <public-rdf-comments@w3.org>
Dear all, The RDF 1.1 spec states that Relative IRIs must be resolved against a base IRI to make them absolute [1]. but also that Further normalization must not be performed when comparing IRIs for equality [1]. However, the URI resolution algorithm linked to in RFC3986 seems to perform normalization at the same time [2]. Therefore, it seems that RDF's IRI resolution is not well-defined in all cases, and can even lead to errors in other cases, as I will discuss in this mail. In RDF, we are not allowed to perform IRI normalization. E.g., the following IRIs are all different and should be treated as such: <http://example.org/foo/bar> <http://example.org/foo/./bar> <http://example.org/foo/baz/../bar> Yet the Turtle spec, for instance, writes that Relative IRIs are resolved with base IRIs as per RFC3986 using only the basic algorithm in section 5.2 [3]. Unfortunately, RFC3986 makes no mention of a "basic algorithm", but it seems the intention is that for example BASE <http://example.org/foo/bar> <a> <./b> <../c>. is interpreted as <http://example.org/foo/a> <http://example.org/foo/b> <http://example.org/c>. But if we look at the algorithm in RFC3986, we see that it actually first concatenates the base and relative IRI: The input buffer is initialized with the now-appended path components and the output buffer is initialized to the empty string [4]. Only after this concatenation, it resolves occurrences of "./" and "../". Hence, it actually concatenates and normalizes to perform resolution, while normalization is actually not allowed for RDF resource identifiers. Concretely, this means that resolution behavior is undefined for those cases where "./" or "../" occurs on either side. To demonstrate this behavior, I have created a test suite [5]. The full results are available online [6]. Below are some of the oddities I found: cwm considers superfluous "./" normalization, EYE does not: BASE http://example.org/xxx/yyy/zzz, resolving ./././aaa/bbb/ccc EYE http://example.org/xxx/yyy/aaa/bbb/ccc cwm http://example.org/xxx/yyy/././aaa/bbb/ccc BASE http://example.org/xxx/yyy/zzz, resolving ../././../././../aaa/bbb/ccc EYE http://example.org/aaa/bbb/ccc cwm http://example.org/xxx/./../././../aaa/bbb/ccc It gets even worse for cases where the base IRI ends with "./" or "../": BASE http://example.org/xxx/yyy/zzz/./././, resolving ../aaa/bbb/ccc EYE/cwm http://example.org/xxx/yyy/zzz/././aaa/bbb/ccc BASE http://example.org/xxx/yyy/zzz/../../../, resolving ../aaa/bbb/ccc EYE/cwm http://example.org/xxx/yyy/zzz/../../aaa/bbb/ccc Note how the trailing "../" of the relative IRI actually splits of "./" or "../" of the base IRI, while this does not have the intended effect at all! So, how do we proceed from here? What is the correct algorithm to perform IRI resolution in presence of "./" and "../"? Or should we ignore their presence and just concatenate? And where do we draw the line between resolution and normalization? Best, Ruben [1] http://www.w3.org/TR/rdf11-concepts/#section-IRIs [2] http://tools.ietf.org/html/rfc3986#section-5.2 [3] http://www.w3.org/TR/turtle/#sec-iri-references [4] http://tools.ietf.org/html/rfc3986#section-5.2.4 [5] https://github.com/RubenVerborgh/TurtleIriResolution [6] https://gist.github.com/RubenVerborgh/eb3717bb78df42369b0f
Received on Thursday, 27 August 2015 11:57:31 UTC