RE: Turtle syntax: Please align base URI with RFC 3986 & 3987

On Monday, May 27, 2013 5:36 AM, David Booth wrote:
> >> That would require a relative URI specified in
> >> @base to be resolved using "Reference Resolution", which is
> >> specified in section 5 of RFC 3986.  But the result of
> >> "Reference Resolution" is "a
> >> string matching the <URI> syntax rule of Section 3", and the <URI>
> >> production *allows* a fragment identifier.
> >
> > And why should that be a problem?
> 
> Because a base URI as defined in RFC 3986 does not permit a fragment
> identifier.  Therefore, if @base specified a relative URI which was
> resolved using RFC3986 "Reference Resolution" then the result could
> contain a fragment identifier.  Thus, a Turtle "base URI" could contain
> a fragment identifier, whereas an RFC 3986 "base URI" does not permit a
> fragment identifier.

No, that's not correct. Even if the base contains a fragment identifier the
result of resolving any relative IRI (even the empty string "") will result
in a URI which does not contain the fragment identifier. Thus it really
doesn't matter. The fragment identifier will be ignored in any case.


> > It is aligned with the two RFCs. There might be a case where you
> > can't resolve a relative @base as the document itself has no IRI
> > but that's the
> > same problem as not being able to resolve relative IRIs anywhere else
> > in such a document.
> 
> If it is aligned with RFC 3986 and 3987 then the alignment certainly is
> not very visible.  I spent quite a lot of time trying to track it down,
> and finally concluded that nothing in the Turtle spec requires Turtle's
> notion of a base URI (which AFAICT is specified using @base) to be an
> absolute-IRI as defined in those RFCs.  Can you please point me to the
> exact wording that requires a Turtle base URI to be an absolute-IRI?

@base enables to establishment of the base URI, it is not the final URI. If
base contains a relative IRI it is resolved against the document's URI or
the application supplied base to obtain the final base URI.


> [...]
>
> Notice that it only references RFC3986 sections 5.1.2 and 5.1.3, which
> only talk (vaguely) about where the base URI might come from.  Those
> sections do not constrain the base URI to be an absolute-URI.  It is
> the
> beginning of RFC3986 section 5.1 that constrains a base URI to be an
> absolute-URI, and that portion is *not* referenced by the Turtle spec.

Yes, in the end you need an absolute URI otherwise you can't resolve
relative ones. There a number of "layers" where the base might come from.
@base -> document URI -> application supplied. I'm writing this mail offline
so I can't give you the exact section in the RFC, but that's explained there
as well.

 
> The last sentence of that second paragraph in Turtle section 6.3 does
> say "Each @base directive sets a new In-Scope Base URI, relative to the
> previous one", and I guess that sentence is the justification for why
> you and Andy are saying that @base can specify a relative URI.  But

Yes


> knowing that RFC3986 requires a base URI to be an absolute-URI, I had
> understood that sentence to mean "Each @base directive sets a new
> In-Scope Base URI, [in relation to] to the previous one", i.e., it is
> new in relation to the previous one.  I had no idea it was suggesting
> that @base could specify a relative URI.
> 
> Bottom line:
> 
>   - This stuff is not at all clear in the current wording.

I find that quite clear and in-line with what, e.g., HTML does. Can you
suggest some concrete wording which would make it clearer?


>   - If @base is permitted to specify a relative IRI then: (a) an
> explanation should be added to explain how that relative IRI is
> converted into an absolute-IRI (including what happens to any fragment
> identifier that the relative IRI contains); and (b) Turtle will not be
> aligned with SPARQL in this regard.

The RFC's explain how a relative IRI can be resolved against a base to an
absolute IRI. @base does nothing special here. Isn't referencing the RFC
enough?


>   - If @base is NOT permitted to specify a relative IRI then the Turtle
> spec should make clear that @base must specify an absolute-IRI, in
> alignment with SPARQL.

That's not the case.


> I was not aware that HTML allowed base URIs to be relative, but, it
> seems more important to align Turtle with SPARQL than with HTML.  Plus
> it would also be simpler.

What's the advantage of such a restriction? If someone wants to use absolute
URIs that's fine. It doesn't add any complexity because the URI resolution
algorithm have to be implemented in any case.



--
Markus Lanthaler
@markuslanthaler

Received on Tuesday, 28 May 2013 22:58:01 UTC