- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Thu, 27 Aug 2015 14:55:21 +0100
- To: Ruben Verborgh <ruben.verborgh@ugent.be>
- Cc: RDF Comments <public-rdf-comments@w3.org>
Hi Ruben, Understand that parsing is the process of producing an *RDF graph* from a *document in some RDF syntax*. The *process* of parsing is governed by the syntax spec (Turtle, JSON-LD, etc.), not by the RDF Concepts spec. RDF Concepts only governs *the data structure* that is produced as the result of the parsing process. In other words, when parsing a document in a concrete RDF syntax such as Turtle, IRI normalisation may very well be performed if the syntax specification requires it, as part of relative IRI resolution or otherwise. The RDF Concepts spec only states that when *testing for IRI equality in an RDF graph*, IRIs are not normalised. Best, Richard > On 27 Aug 2015, at 12:57, Ruben Verborgh <ruben.verborgh@ugent.be> wrote: > > Dear all, > > The RDF 1.1 spec states that > Relative IRIs must be resolved against a base IRI to make them absolute [1]. > but also that > Further normalization must not be performed when comparing IRIs for equality [1]. > However, the URI resolution algorithm linked to in RFC3986 > seems to perform normalization at the same time [2]. > Therefore, it seems that RDF's IRI resolution is not well-defined in all cases, > and can even lead to errors in other cases, as I will discuss in this mail. > > In RDF, we are not allowed to perform IRI normalization. > E.g., the following IRIs are all different and should be treated as such: > <http://example.org/foo/bar> > <http://example.org/foo/./bar> > <http://example.org/foo/baz/../bar> > > Yet the Turtle spec, for instance, writes that > Relative IRIs are resolved with base IRIs as per RFC3986 > using only the basic algorithm in section 5.2 [3]. > Unfortunately, RFC3986 makes no mention of a "basic algorithm", > but it seems the intention is that for example > BASE <http://example.org/foo/bar> > <a> <./b> <../c>. > is interpreted as > <http://example.org/foo/a> <http://example.org/foo/b> <http://example.org/c>. > > But if we look at the algorithm in RFC3986, > we see that it actually first concatenates the base and relative IRI: > The input buffer is initialized with the now-appended path components > and the output buffer is initialized to the empty string [4]. > Only after this concatenation, it resolves occurrences of "./" and "../". > Hence, it actually concatenates and normalizes to perform resolution, > while normalization is actually not allowed for RDF resource identifiers. > > Concretely, this means that resolution behavior is undefined > for those cases where "./" or "../" occurs on either side. > To demonstrate this behavior, I have created a test suite [5]. > The full results are available online [6]. > > Below are some of the oddities I found: > > cwm considers superfluous "./" normalization, EYE does not: > BASE http://example.org/xxx/yyy/zzz, resolving ./././aaa/bbb/ccc > EYE http://example.org/xxx/yyy/aaa/bbb/ccc > cwm http://example.org/xxx/yyy/././aaa/bbb/ccc > > BASE http://example.org/xxx/yyy/zzz, resolving ../././../././../aaa/bbb/ccc > EYE http://example.org/aaa/bbb/ccc > cwm http://example.org/xxx/./../././../aaa/bbb/ccc > > It gets even worse for cases where the base IRI ends with "./" or "../": > > BASE http://example.org/xxx/yyy/zzz/./././, resolving ../aaa/bbb/ccc > EYE/cwm http://example.org/xxx/yyy/zzz/././aaa/bbb/ccc > > BASE http://example.org/xxx/yyy/zzz/../../../, resolving ../aaa/bbb/ccc > EYE/cwm http://example.org/xxx/yyy/zzz/../../aaa/bbb/ccc > > Note how the trailing "../" of the relative IRI > actually splits of "./" or "../" of the base IRI, > while this does not have the intended effect at all! > > So, how do we proceed from here? > What is the correct algorithm to perform IRI resolution in presence of "./" and "../"? > Or should we ignore their presence and just concatenate? > And where do we draw the line between resolution and normalization? > > Best, > > Ruben > > [1] http://www.w3.org/TR/rdf11-concepts/#section-IRIs > [2] http://tools.ietf.org/html/rfc3986#section-5.2 > [3] http://www.w3.org/TR/turtle/#sec-iri-references > [4] http://tools.ietf.org/html/rfc3986#section-5.2.4 > [5] https://github.com/RubenVerborgh/TurtleIriResolution > [6] https://gist.github.com/RubenVerborgh/eb3717bb78df42369b0f
Received on Thursday, 27 August 2015 13:55:47 UTC