Re: Turtle syntax: Please align base URI with RFC 3986 & 3987

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 27 May 2013 11:53:10 +0100
Message-ID: <51A33B16.2090107@epimorphics.com>
To: public-rdf-comments@w3.org

You seem to have a different processing model to the one I think it is. 
  You seem to believe the base is exactly the characters used for 
IRIREF; I understand it as URI resolution applies then the output is 
passed to whatever is doing base URI processing to be used.

For context: XML


and the example of a relative URI "/hotpicks/" for xml:base for a 
element.  Turtle (and SPARQL) are just dong what everything else does here.

What triples do you expect from, and what sequence of process steps 
would you expect a process to take, for these Turtle documents: in each 
case they are obtained by

GET http://example/location/file.ttl

<s> <p> <#o> .

@base        <http://example/base2> .
<s> <p> <#o> .

<s> <p> <#o> .
@base        <http://example/base2> .
<s> <p> <#o> .

@base        <base2/> .
<s> <p> <#o> .

Document5:: corner case:
@base          <base2/> .
@prefix  ns1:  <ns#> .
ns1:s <p> <#o> .

After resolution, before used as the base, it is absolute - all URIs in 
RDF are absolute. This absolute URI - possible with fragment, is then 
given to what ever machinery is doing to further URI resolution.  That 
code is responsible for determining the right base URI given the inputs.

Hence, I see that
"If the base URI is obtained from a URI reference,  ..."


On 27/05/13 04:36, David Booth wrote:> Hi Markus,
 > On 05/26/2013 06:37 PM, Markus Lanthaler wrote:
 >> On Sunday, May 26, 2013 7:17 PM, David Booth wrote:
 >>>> The syntax has
 >>>> @base IRIREF .
 >>>> and the @base is no different to other URIs - it is subject to URI
 >>>> resolution.
 >>> But I don't see anything there that explicitly requires IRIREF to be an
 >>> absolute-IRI as defined in RFC3987.  Other parts of the Turtle syntax
 >>> (such as the @prefix production) also use the IRIREF syntax production
 >>> without requiring it to be an absolute-IRI.  That's why it isn't clear
 >>> that in the case of @base it must be an absolute-IRI.
 >> It can be a relative IRI as well. In that case it gets resolved
 >> against the
 >> currently active base IRI.
 >>>> @base <relURI> .
 >>>> is also legal as is
 >>>> @base <../sibling> .
 >>>> which might be occasionally useful.
 >>> Huh?  Are you saying that @base can recursively specify the base URI
 >>> using a *relative* URI?  Then there would have to be a base URI of the
 >>> @base URI?
 >> Yes, not recursively though but sequentially.
 >>> I'm very surprised to hear you say that a relative @base URI would be
 >>> legal.  I don't think that should be allowed.  That seems too
 >>> mysterious and error prone to me.
 >> HTML allows that as well e.g.
 >>> That would require a relative URI specified in
 >>> @base to be resolved using "Reference Resolution", which is specified
 >>> in
 >>> section 5 of RFC 3986.  But the result of "Reference Resolution" is "a
 >>> string matching the <URI> syntax rule of Section 3", and the <URI>
 >>> production *allows* a fragment identifier.
 >> And why should that be a problem?
 > Because a base URI as defined in RFC 3986 does not permit a fragment
 > identifier.  Therefore, if @base specified a relative URI which was
 > resolved using RFC3986 "Reference Resolution" then the result could
 > contain a fragment identifier.  Thus, a Turtle "base URI" could contain
 > a fragment identifier, whereas an RFC 3986 "base URI" does not permit a
 > fragment identifier.
 >>> I think it would be better to align directly with SPARQL and RFC 3986
 >>> and RFC 3987 by explicitly requiring @base to specify an absolute-IRI.
 >> It is aligned with the two RFCs. There might be a case where you can't
 >> resolve a relative @base as the document itself has no IRI but 
that's the
 >> same problem as not being able to resolve relative IRIs anywhere else in
 >> such a document.
 > If it is aligned with RFC 3986 and 3987 then the alignment certainly is
 > not very visible.  I spent quite a lot of time trying to track it down,
 > and finally concluded that nothing in the Turtle spec requires Turtle's
 > notion of a base URI (which AFAICT is specified using @base) to be an
 > absolute-IRI as defined in those RFCs.  Can you please point me to the
 > exact wording that requires a Turtle base URI to be an absolute-IRI?
 > The Turtle EBNF certainly does not require it.
 > Turtle section 6.3 has two paragraphs.  The first says:
 > http://www.w3.org/TR/turtle/#sec-iri-references
 > [[
 > Relative IRIs are resolved with base IRIs as per Uniform Resource
 > Identifier (URI): Generic Syntax [RFC3986] using only the basic
 > algorithm in section 5.2. Neither Syntax-Based Normalization nor
 > Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of
 > RFC3986) are performed. Characters additionally allowed in IRI
 > references are treated in the same way that unreserved characters are
 > treated in URI references, per section 6.5 of Internationalized Resource
 > Identifiers (IRIs) [RFC3987].
 > ]]
 > That paragraph only talks about resolving relative URIs.  It does not
 > specify the base URI.
 > The first sentence of the second paragraph says:
 > [[
 > The @base directive defines the Base IRI used to resolve relative IRIs
 > per RFC3986 section 5.1.1, "Base URI Embedded in Content".
 > ]]
 > and RFC3986 section 5.1.1 says: "Within certain media types, a base URI
 > for relative references can be embedded within the content itself".
 > Since the Turtle directive is called "@base" (or "BASE") and the Turtle
 > spec often uses the term "base URI", this would strongly suggest that
 > the @base directive is used to specify a base URI that is "embedded
 > within the content itself".  But if you and Andy are telling me that
 > @base may provide a relative URI, then the actual base URI is *not*
 > actually "embedded within the content itself".  Rather, it is
 > (recursively) determined by resolving that relative URI against some
 > other base URI.
 > The rest of the second paragraph in Turtle section 6.3 says:
 > [[
 > Section 5.1.2, "Base URI from the Encapsulating Entity" defines how the
 > In-Scope Base IRI may come from an encapsulating document, such as a
 > SOAP envelope with an xml:base directive or a mime multipart document
 > with a Content-Location header. The "Retrieval URI" identified in 5.1.3,
 > Base "URI from the Retrieval URI", is the URL from which a particular
 > Turtle document was retrieved. If none of the above specifies the Base
 > URI, the default Base URI (section 5.1.4, "Default Base URI") is used.
 > Each @base directive sets a new In-Scope Base URI, relative to the
 > previous one.
 > ]]
 > Notice that it only references RFC3986 sections 5.1.2 and 5.1.3, which
 > only talk (vaguely) about where the base URI might come from.  Those
 > sections do not constrain the base URI to be an absolute-URI.  It is the
 > beginning of RFC3986 section 5.1 that constrains a base URI to be an
 > absolute-URI, and that portion is *not* referenced by the Turtle spec.
 > The last sentence of that second paragraph in Turtle section 6.3 does
 > say "Each @base directive sets a new In-Scope Base URI, relative to the
 > previous one", and I guess that sentence is the justification for why
 > you and Andy are saying that @base can specify a relative URI.  But
 > knowing that RFC3986 requires a base URI to be an absolute-URI, I had
 > understood that sentence to mean "Each @base directive sets a new
 > In-Scope Base URI, [in relation to] to the previous one", i.e., it is
 > new in relation to the previous one.  I had no idea it was suggesting
 > that @base could specify a relative URI.
 > Bottom line:
 >   - This stuff is not at all clear in the current wording.
 >   - If @base is permitted to specify a relative IRI then: (a) an
 > explanation should be added to explain how that relative IRI is
 > converted into an absolute-IRI (including what happens to any fragment
 > identifier that the relative IRI contains); and (b) Turtle will not be
 > aligned with SPARQL in this regard.
 >   - If @base is NOT permitted to specify a relative IRI then the Turtle
 > spec should make clear that @base must specify an absolute-IRI, in
 > alignment with SPARQL.
 > I was not aware that HTML allowed base URIs to be relative, but, it
 > seems more important to align Turtle with SPARQL than with HTML.  Plus
 > it would also be simpler.
 > David
