Re: Call for Consensus: IRI resolution tests

> On Oct 27, 2015, at 8:18 AM, Andy Seaborne <andy@apache.org> wrote:
> 
> On 27/10/15 09:19, Eric Prud'hommeaux wrote:
>> On 10/25/2015 03:02 PM, Andy Seaborne wrote:
>>> On 25/10/15 16:14, Gregg Kellogg wrote:
>>>> On Oct 25, 2015, at 8:48 AM, Andy Seaborne <andy@apache.org
>>>> <mailto:andy@apache.org>> wrote:
>>>> 
>>>>> On 25/10/15 12:01, Ruben Verborgh wrote:
>>>>>> Dear Andy,
>>>>>> 
>>>>>>> The tests make an additional assumption that absolute URIs are not
>>>>>>> normalized.  This is not covered by the Turtle spec one way or
>>>>>>> another (nor should it be).  Both normalizing and not normalizing
>>>>>>> are possible.
>>>>>> 
>>>>>> I disagree here—there Turtle spec should cover this.
>>>>> 
>>>>> "should" or "does"? Are you arguing for a change to Turtle?
>>>>> 
>>>>> If it's a change, then -1 to these tests.
>>>>> 
>>>>> One way is to avoid the area that is a problem for 3986 and change the
>>>>> tests to use the "/../" from the "/.." form.  As you yourself noted,
>>>>> normalization is assumed by RFC3986/5.2. Or follow RFC 3987 and don't
>>>>> have absolute URIs with them in.
>>>>> 
>>>>>> Otherwise, two identical Turtle documents can result in different
>>>>>> sets of triples.
>>>>> 
>>>>> ... in the one case where the base URI ends in "/.." which isn't good
>>>>> practice; RFC 3987/5.3.2.4 even says it is not intended usage.
>>>>> 
>>>>>> I think it's clear that absolute URIS should not be touched,
>>>>>> and that the spec also says this.
>>>>> 
>>>>> The spec being Turtle?
>>>>> 
>>>>> Please quote text where it says that about @base.
>>>> 
>>>> The key for me was this sentence from the IRIs section:
>>>> 
>>>> > Relative IRIs like |<#green-goblin>| are resolved relative to the
>>>> current base IRI.
>>>> 
>>>> It says that _relative_ IRIs are resolved, but is silent on absolute
>>>> IRIs. Thus, if the value of @base is an absolute IRI it is not changed
>>>> at all, and used as is when resolving other relative IRIs. (Note, my
>>>> implementation did this previously, but I was convinced this was an
>>>> error; always resolving an IRI against the current base is supported in
>>>> RFC3982, but not called for from our specs. If it were, it would
>>>> arguably be more consistent).
>>> 
>>> Being silent to me means the RFCs apply.  So we have two readings - we
>>> should have tests that do not choose one reading over another as we are
>>> not in the role of changing or interpreting the specs.
>> 
>> I don't think it was silent on this.
>> <http://www.w3.org/TR/turtle/#h3_sec-iri-references> says
>> [[
>> Relative IRIs are resolved with base IRIs as per Uniform Resource
>> Identifier (URI): Generic Syntax [RFC3986] using only the basic
>> algorithm in section 5.2. Neither Syntax-Based Normalization nor
>> Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of
>> RFC3986) are performed.
>> ]]
> 
> That does not say anything about absolute URIs in @base, only the sorting out of relative URIs.  By not saying anything, I take it that RFC3986 and 3987 apply.

To me, the specs specify using RFC3986 only for relative IRI resolution. They do not say what to do with absolute IRIs. If it had simply said that all IRIs are resolved according to 3986/7 it would be clear what to do. In the absence of that, there is no requirement that an implementation resolve, or otherwise normalize, absolute IRIs used in @base, @prefix, or anywhere else; the RFCs only apply when doing relative IRI resolution.

I see we have three possibilities:

1. Add corner-case tests where @base has odd forms and expect absolute IRIs resulting from resolving relative IRIs use the behavior currently described in the PR.
2. Add corner-case tests where @base has odd forms and expect absolute IRIs be resolved against the document location, thus invoking RFC3986 5.2 as the basis for resolving other relative IRIs.
3. Avoid testing ambiguous base IRIs, as it is really pretty artificial, and no current spec discusses it explicitly.

In any case, some Errata is warranted, pretty much for all RDF serializations supporting some notion of relative IRI resolution, IMO.

Prior to this, I thought the community had generally agreed to point #1. If we can’t reach consensus around this, then I would vote for #3.

Gregg

>> 3986 §5.2 includes 4 sections:
>> [[
>>        5.2.  Relative Resolution  . . . . . . . . . . . . . . . . . . 30
>>              5.2.1.  Pre-parse the Base URI . . . . . . . . . . . . . 31
>>              5.2.2.  Transform References . . . . . . . . . . . . . . 31
>>              5.2.3.  Merge Paths  . . . . . . . . . . . . . . . . . . 32
>>              5.2.4.  Remove Dot Segments  . . . . . . . . . . . . . . 33
>> ]]
>> 
>> I believe the answer to this is covered in algorithm 2 D:
>> [[
>> D.  if the input buffer consists only of "." or "..", then remove
>>            that from the input buffer;
>> ]]
>> <http://tools.ietf.org/html/rfc3986#section-5.2.4>
> 
> We all seem to agree that relative resolution all works cleanly with normalized (at least dot-segment applied, or ".."-less) base URIs.
> 
> It feels as if there is an implicit, unstated assumption in the relative resolution in RFC3986 that base URIs are normalized.  Ditto 3987 and the statement that "." and ".." are intended for the start of relative URIs.  But the RFCs it do not say it.
> 
> In 5.2.3. Merge Paths
> 
> The second bullet - it simply chops off after the last "/" but if it ends  "/.." then the ".." has no effect in the path (except in <> -- the effect is to leave it as "/.." unlike elsewhere).
> 
> If it ends "/../", it does or if it is elsewhere (base or relative URI) or if the base is normalized.  Oops.
> 
> The merge base URI can be pointing to different places in the path hierarchy; it's not just different ways to write down the same place.
> 
> Good news - these are corner cases of when ".." is in the base URI.
> 
> We can't adjust RFC3986 so the best solution is text to say "it's a bad idea to put '..' in the @base.  That's in RFC 3987 already.  I don't think anyone has a use case where an absolute path URI with ".." is a good thing.
> 
> 
>  Andy
> 
> (And a picky point - algorithm 5.2 does not actually put the URI back together again! - 5.3 does that :-)
> 
>> 
>>> The RFCs say that ".." and "." are intended for relative URIs only.  RDF
>>> Concepts says they are "best avoided".
>>> 
>>> I think it is a bug in RFC 3986 and called out in 3987 as "situation not
>>> intended to happen".
>>> 
>>>     Andy
>>> 
>>> [*] A fix is merging the base and relative URI needs to treat "/.." as
>>> "/../" either by minimal normalization or in rule the merge rule.
>>> Otherwise various inconsistencies appear.
>>> 
>>> 
>>>> 
>>>> Looking at other specs, I think the same is true for JSON-LD, RDFa and
>>>> RDF/XML.
>>>> 
>>>>>   Andy
>>>>> 
>>>>> (RDF/XML is different on relative URIs)
>>>> 
>>>> Why do you say this? Can you site something from the spec?
>>> 
>>> """
>>> 5.3 Resolving URIs
>>> 
>>> RDF/XML supports XML Base [XML-BASE] which defines a ·base-uri· accessor
>>> for each ·root event· and ·element event·. Relative URI references are
>>> resolved into RDF URI references according to the algorithm specified in
>>> XML Base [XML-BASE] (and RFC 2396).
>>> """
>>> i.e. it says "use the algorithm".
>>>> 
>>>> Gregg
>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Ruben

Received on Tuesday, 27 October 2015 18:18:46 UTC