Re: RDF's relative IRI resolution is ambiguous from Ruben Verborgh on 2015-08-27 (public-rdf-comments@w3.org from August 2015)

From: Ruben Verborgh <ruben.verborgh@ugent.be>
Date: Thu, 27 Aug 2015 20:54:05 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDF Comments <public-rdf-comments@w3.org>
Message-Id: <081DB931-8CAD-42B3-A77D-AC1C85A365F1@ugent.be>
Hi Richard,

I'll share exactly how I came to write this mail thread, hope this helps.

1. Somebody alerts me that N3.js does not correctly resolve paths containing "../"
2. I check the RDF spec to see how it should be done. I find:
   a) “Further normalization must not be performed when comparing IRIs for equality.“
   b) “Some concrete RDF syntaxes permit relative IRIs [which] must be resolved against a base IRI.”
   So a) tells me that I cannot simply resolve all "../" everywhere,
   and b) tells me that I should look at the concrete specs, which is Turtle.
3. The Turtle spec tells me:
   a) Relative IRIs are resolved […] as per [RFC3986] using only the basic algorithm in section 5.2.
   b) Neither […] (described in sections 6.2.2 and 6.2.3 of RFC3986) are performed.
   So a) tells me that I should look up the basic algorithm in 5.2,
   and b) tels me that I should ignore sections 6.2.2 and 6.2.3.
4. I look up Section 5.2 in RFC3986. It contains 3 algorithms: 5.2.2, 5.2.3, 5.2.4.
    At this point I'm confused: which of these algorithms is the basic algorithm?
    Also, I should ignore 6.2.2, which includes removing dot segments.
5. Because I cannot understand the instruction in the Turtle spec,
    I decide to have a look at how parsers I've been using for years implement it.
6. After some initial tests, I notice that EYE/cwm/Serd/Raptor each do different things.
7. I observe that:
    a) I was confused by the Turtle spec and could not understand how to implement it;
    b) the sources I consulted to help my confusion seemed also confused.
8. Hence, I conclude that my confusion is shared by others,
    so the wording in the spec is likely ambiguous.
9. I post to public-rdf-comments@w3.org to explain my confusion and ask for the correct interpretation.

> If you read 5.2 of RFC 3986, it says that the algorithm has optional parts, described in 6.2.2 and 6.2.3. To me, this makes the following reading of the paragraph above quite sensible: “Use only the basic algorithm in 5.2; don’t use the optional advanced parts in 6.2.2 and 6.2.3.”
> 
> And given that 5.2 doesn’t mention a “basic algorithm”, I can’t really think of any other sensible reading of the paragraph.

Given that 5.2 contains 3 algorithms, I assumed that one of them was the basic algorithm,
and that it was apparently obvious. I just couldn't figure out which one it was,
and neither could the implementations I consulted, it seemed.

> Well, but that doesn’t make the spec ambiguous; it makes some parsers buggy.

Either or both can be true really.
For sure, I can tell you that my N3.js parser gives a different result
because I could not figure out what the correct interpretation was.
The difference in output is purely my not understanding the spec,
it is not a bug in implementing the spec. So that might be the case for others, too.

> If you can’t construe such a reading, why not assume that they simply are buggy?

I did assume that, because I did find bugs too.
But I also assumed the spec was ambiguous,
because I couldn't interpret it.

In fact, the observation that no single Turtle parser I tried
performs the correct resolution, I think that it's more than only bugs.
Clearly, no implementer found out how to do it bug-free (except Greg).

> Are you saying the Turtle spec can be sensibly read as saying that *no* normalisation should be performed at all?

Actually, yes. The Turtle spec says:

    Relative IRIs are resolved with base IRIs as per Uniform Resource Identifier (URI):
    Generic Syntax [RFC3986] using only the basic algorithm in section 5.2.
    Neither Syntax-Based Normalization nor Scheme-Based Normalization
    (described in sections 6.2.2 and 6.2.3 of RFC3986) are performed.

Section 6.2.2 RFC3986 says (emphasis mine):

   Syntax-based normalization includes such techniques as
   case normalization, percent-encoding normalization, **and removal of dot-segments**.

So my interpretation of the Turtle spec is:
a) You should use one of the algorithms in Section 5.2 (the basic one)
b) You should not perform removal of dot-segments, because that is 6.2.2

Given those two above, I conclude the "basic algorithm" of section 5.2 cannot be
"5.2.4. Remove Dot Segments", because the Turtle spec explicitly says
I should not do syntax-based normalization, which includes dot segments removal.
Following that reasoning, this leaves us two candidates for "basic algorithm": 5.2.2 and 5.2.3.
Neither of which explained the behavior I was seeing.

> Your reasoning is: “Four different parsers produce four different behaviours. Therefore, I can’t simply do what the spec says.”

My reasoning was: "The spec confuses me. Let's try different parsers. Hmm, they are also confused. What do I do now?”

>> It does not prevent me; it's just not clear what the correct result of parsing the above Turtle should be.
> 
> Well, yeah, that’s because you got hung up on the interpretation of the word “basic” in the Turtle spec…

Well, I got confused about:
a) how to find the "basic algorithm" in 5.2, which contains three algorithms;
b) the Turtle spec explicitly saying I cannot do syntax-based normalization;
c) all of the parsers I tried disagreeing on what to do.

> What makes you think that there actually were different interpretations of the spec?

My own failure in being able to interpret the spec correctly
made me assume (and I still assume) others failed as well.

> As a spec writer, it is very humbling (and sometimes discouraging) to see how one’s prose can be misunderstood, no matter how clear one tried to be! And trying to be extremely precise often makes it worse.

Hope my elaborated story sheds light on how I misunderstood—don't be discouraged :-)

Best,

Ruben
Received on Thursday, 27 August 2015 18:54:37 UTC