Re: Turtle syntax: Please align base URI with RFC 3986 & 3987

Hi Markus,

On 05/26/2013 06:37 PM, Markus Lanthaler wrote:
> On Sunday, May 26, 2013 7:17 PM, David Booth wrote:
>>> The syntax has
>>>
>>> @base IRIREF .
>>>
>>> and the @base is no different to other URIs - it is subject to URI
>>> resolution.
>>
>> But I don't see anything there that explicitly requires IRIREF to be an
>> absolute-IRI as defined in RFC3987.  Other parts of the Turtle syntax
>> (such as the @prefix production) also use the IRIREF syntax production
>> without requiring it to be an absolute-IRI.  That's why it isn't clear
>> that in the case of @base it must be an absolute-IRI.
>
> It can be a relative IRI as well. In that case it gets resolved against the
> currently active base IRI.
>
>
>>> @base <relURI> .
>>>
>>> is also legal as is
>>>
>>> @base <../sibling> .
>>>
>>> which might be occasionally useful.
>>
>> Huh?  Are you saying that @base can recursively specify the base URI
>> using a *relative* URI?  Then there would have to be a base URI of the
>> @base URI?
>
> Yes, not recursively though but sequentially.
>
>
>> I'm very surprised to hear you say that a relative @base URI would be
>> legal.  I don't think that should be allowed.  That seems too
>> mysterious and error prone to me.
>
> HTML allows that as well e.g.
>
>
>> That would require a relative URI specified in
>> @base to be resolved using "Reference Resolution", which is specified
>> in
>> section 5 of RFC 3986.  But the result of "Reference Resolution" is "a
>> string matching the <URI> syntax rule of Section 3", and the <URI>
>> production *allows* a fragment identifier.
>
> And why should that be a problem?

Because a base URI as defined in RFC 3986 does not permit a fragment 
identifier.  Therefore, if @base specified a relative URI which was 
resolved using RFC3986 "Reference Resolution" then the result could 
contain a fragment identifier.  Thus, a Turtle "base URI" could contain 
a fragment identifier, whereas an RFC 3986 "base URI" does not permit a 
fragment identifier.

>
>
>> I think it would be better to align directly with SPARQL and RFC 3986
>> and RFC 3987 by explicitly requiring @base to specify an absolute-IRI.
>
> It is aligned with the two RFCs. There might be a case where you can't
> resolve a relative @base as the document itself has no IRI but that's the
> same problem as not being able to resolve relative IRIs anywhere else in
> such a document.

If it is aligned with RFC 3986 and 3987 then the alignment certainly is 
not very visible.  I spent quite a lot of time trying to track it down, 
and finally concluded that nothing in the Turtle spec requires Turtle's 
notion of a base URI (which AFAICT is specified using @base) to be an 
absolute-IRI as defined in those RFCs.  Can you please point me to the 
exact wording that requires a Turtle base URI to be an absolute-IRI?

The Turtle EBNF certainly does not require it.

Turtle section 6.3 has two paragraphs.  The first says:
http://www.w3.org/TR/turtle/#sec-iri-references
[[
Relative IRIs are resolved with base IRIs as per Uniform Resource 
Identifier (URI): Generic Syntax [RFC3986] using only the basic 
algorithm in section 5.2. Neither Syntax-Based Normalization nor 
Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of 
RFC3986) are performed. Characters additionally allowed in IRI 
references are treated in the same way that unreserved characters are 
treated in URI references, per section 6.5 of Internationalized Resource 
Identifiers (IRIs) [RFC3987].
]]
That paragraph only talks about resolving relative URIs.  It does not 
specify the base URI.

The first sentence of the second paragraph says:
[[
The @base directive defines the Base IRI used to resolve relative IRIs 
per RFC3986 section 5.1.1, "Base URI Embedded in Content".
]]
and RFC3986 section 5.1.1 says: "Within certain media types, a base URI 
for relative references can be embedded within the content itself". 
Since the Turtle directive is called "@base" (or "BASE") and the Turtle 
spec often uses the term "base URI", this would strongly suggest that 
the @base directive is used to specify a base URI that is "embedded 
within the content itself".  But if you and Andy are telling me that 
@base may provide a relative URI, then the actual base URI is *not* 
actually "embedded within the content itself".  Rather, it is 
(recursively) determined by resolving that relative URI against some 
other base URI.

The rest of the second paragraph in Turtle section 6.3 says:
[[
Section 5.1.2, "Base URI from the Encapsulating Entity" defines how the 
In-Scope Base IRI may come from an encapsulating document, such as a 
SOAP envelope with an xml:base directive or a mime multipart document 
with a Content-Location header. The "Retrieval URI" identified in 5.1.3, 
Base "URI from the Retrieval URI", is the URL from which a particular 
Turtle document was retrieved. If none of the above specifies the Base 
URI, the default Base URI (section 5.1.4, "Default Base URI") is used. 
Each @base directive sets a new In-Scope Base URI, relative to the 
previous one.
]]
Notice that it only references RFC3986 sections 5.1.2 and 5.1.3, which 
only talk (vaguely) about where the base URI might come from.  Those 
sections do not constrain the base URI to be an absolute-URI.  It is the 
beginning of RFC3986 section 5.1 that constrains a base URI to be an 
absolute-URI, and that portion is *not* referenced by the Turtle spec.

The last sentence of that second paragraph in Turtle section 6.3 does 
say "Each @base directive sets a new In-Scope Base URI, relative to the 
previous one", and I guess that sentence is the justification for why 
you and Andy are saying that @base can specify a relative URI.  But 
knowing that RFC3986 requires a base URI to be an absolute-URI, I had 
understood that sentence to mean "Each @base directive sets a new 
In-Scope Base URI, [in relation to] to the previous one", i.e., it is 
new in relation to the previous one.  I had no idea it was suggesting 
that @base could specify a relative URI.

Bottom line:

  - This stuff is not at all clear in the current wording.

  - If @base is permitted to specify a relative IRI then: (a) an 
explanation should be added to explain how that relative IRI is 
converted into an absolute-IRI (including what happens to any fragment 
identifier that the relative IRI contains); and (b) Turtle will not be 
aligned with SPARQL in this regard.

  - If @base is NOT permitted to specify a relative IRI then the Turtle 
spec should make clear that @base must specify an absolute-IRI, in 
alignment with SPARQL.

I was not aware that HTML allowed base URIs to be relative, but, it 
seems more important to align Turtle with SPARQL than with HTML.  Plus 
it would also be simpler.

David

Received on Monday, 27 May 2013 03:36:50 UTC