Re: RDF's relative IRI resolution is ambiguous from Richard Cyganiak on 2015-08-27 (public-rdf-comments@w3.org from August 2015)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 27 Aug 2015 17:51:49 +0100
To: Ruben Verborgh <ruben.verborgh@ugent.be>
Cc: RDF Comments <public-rdf-comments@w3.org>
Message-Id: <20A3DA68-3B46-4AD5-939E-B78F32851E6F@cyganiak.de>
Hi Ruben,

> On 27 Aug 2015, at 15:57, Ruben Verborgh <ruben.verborgh@ugent.be> wrote:
> 
> Hi Richard,
> 
>> The Turtle spec says to use the algorithm in Section 5.2 of RFC 3986. In what way is that not conclusive?
> 
> Actually, the Turtle spec says:
> 
>   Relative IRIs are resolved with base IRIs as per RFC3986
>   using only the basic algorithm in section 5.2 [3].
> 
> However, there is no such thing as "only the basic algorithm" in RFC3986.

The full quote from the Turtle spec:

>>>
Relative IRIs are resolved with base IRIs as per Uniform Resource Identifier (URI): Generic Syntax [RFC3986] using only the basic algorithm in section 5.2. Neither Syntax-Based Normalization nor Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of RFC3986) are performed.
<<<

If you read 5.2 of RFC 3986, it says that the algorithm has optional parts, described in 6.2.2 and 6.2.3. To me, this makes the following reading of the paragraph above quite sensible: “Use only the basic algorithm in 5.2; don’t use the optional advanced parts in 6.2.2 and 6.2.3.”

And given that 5.2 doesn’t mention a “basic algorithm”, I can’t really think of any other sensible reading of the paragraph.

>> Why can’t you simply do what the Turtle spec says and apply the algorithm in RFC 3986?
> 
> Because it is ambiguous, which is why I started this thread.
> If we interpret "only the basic algorithm" as "the entire algorithm in RFC3986 5.2", then
>    @base <http://example.org/xxx/yyy/zzz/../../../>.
>    <> <a> <../../../a>.
> would result in
>    <http://example.org/xxx/yyy/zzz/../../../> <http://example.org/a> <http://example.org/a>.

Correct.

> which is not what all existing parsers do.

Well, but that doesn’t make the spec ambiguous; it makes some parsers buggy.

> For example, SERD and cwm parse the above as
>    <http://example.org/xxx/yyy/zzz/../../../>     <http://example.org/xxx/yyy/zzz/../../../a> <http://example.org/xxx/yyy/zzz/a>.
> EYE instead parses it as
>    <http://example.org/xxx/yyy/zzz/../../../> <http://example.org/xxx/yyy/zzz/../../../a> <http://example.org/xxx/yyy/zzz/a>.
> EasyRDF parses it as
>   <http://example.org/> <http://example.org/a> <http://example.org/a> .
> while Raptor produces the strict application of RFC3986.
> 
> Which one is right cannot be determined without an interpretation of "the basic algorithm”.

Do you see a possible reading of the phrase “the basic algorithm” that actually makes the behaviour of SERD, EYE or EasyRDF right? If you can’t construe such a reading, why not assume that they simply are buggy?

>>> because IRI normalization under the RDF model leads to a different graph.
>> 
>> Different from what?
> 
> Different from that graph without normalization.

Are you saying the Turtle spec can be sensibly read as saying that *no* normalisation should be performed at all? I don’t see how it can be read that way.

> In other words: the 4 different parsings above are 4 different graphs indeed.

Your reasoning is: “Four different parsers produce four different behaviours. Therefore, I can’t simply do what the spec says.”

I would suggest a different reasoning: “Four different parsers produce four different behaviours. Therefore, I should read the spec very carefully and do exactly what it says, and not pay attention to these parsers when determining correctness. This is probably a corner case that people don’t bump into often, so the kinks in the parsers haven’t been worked out, and at least three, if not all four, are buggy.”

>> And why do you think that this difference prevents you from applying IRI normalisation as demanded by RFC 3986?
> 
> It does not prevent me; it's just not clear what the correct result of parsing the above Turtle should be.

Well, yeah, that’s because you got hung up on the interpretation of the word “basic” in the Turtle spec…

>> And what does “under the RDF model” mean?
> 
> I meant that, under the RDF model, these are all different graphs
> 
> GRAPH []  { <http://example.org/xxx/yyy/zzz/../../../> <http://example.org/a> <http://example.org/a>. }
> GRAPH []  { <http://example.org/xxx/yyy/zzz/../../../>     <http://example.org/xxx/yyy/zzz/../../../a> <http://example.org/xxx/yyy/zzz/a>. }
> GRAPH []  { <http://example.org/xxx/yyy/zzz/../../../> <http://example.org/xxx/yyy/zzz/../../../a> <http://example.org/xxx/yyy/zzz/a>. }
> GRAPH []  { <http://example.org/> <http://example.org/a> <http://example.org/a>. }
> 
> Yet they all are the result of parsing the same Turtle file,
> under different interpretations of what "the basic algorithm in RFC3986" means.

What makes you think that there actually were different interpretations of the spec? I would think that differences in implementation behaviour are more often the result of bugs, and rarely the result of different readers interpreting the spec differently. I would expect that most implementers just used an out-of-the-box function for resolving relative URIs, and assumed that it will be good enough, until proven otherwise.

And yes, I will admit that both the RDF Concepts spec and the Turtle spec could have been worded better here. As a spec writer, it is very humbling (and sometimes discouraging) to see how one’s prose can be misunderstood, no matter how clear one tried to be! And trying to be extremely precise often makes it worse. As you said elsewhere in the thread, test cases help.

Best,
Richard
Received on Thursday, 27 August 2015 16:52:16 UTC