W3C home > Mailing lists > Public > public-rdf-comments@w3.org > May 2013

Re: Turtle syntax: Please align base URI with RFC 3986 & 3987

From: David Booth <david@dbooth.org>
Date: Thu, 30 May 2013 21:24:08 -0400
Message-ID: <51A7FBB8.4030401@dbooth.org>
To: Peter Occil <poccil14@gmail.com>
CC: public-rdf-comments@w3.org, Andy Seaborne <andy.seaborne@epimorphics.com>
Hi Peter,

I don't think that's correct, because RFC3986 says that an absolute-URI 
cannot have a fragment component:
http://tools.ietf.org/html/rfc3986#section-4.3
and section 5.1 explicitly says: "A base URI must conform to the 
<absolute-URI> syntax rule (Section 4.3).".

David

On 05/29/2013 01:58 PM, Peter Occil wrote:
> Your suggested text should be corrected as follows:
>
> [[
> (b) if the previous base IRI contained a fragment component,
> the fragment component will be replaced with the fragment
> component that the given IRIREF has, or stripped if the given
> IRIREF has none;
> ]]
>
> That's because if the given IRIREF contains a fragment
> component, that component will be inherited in the new
> absolute URI.  Only the fragment component of the previous
> base URI will be stripped.
>
> As I understand it, the given IRIREF represents R in section
> 5.2.2 of RFC3986, and the previous base URI represents Base
> in that section; correct me if I'm wrong. At the end of that
> section you can see that the new absolute URI's fragment
> changes to R's fragment.
>
> --Peter
>
> -----Original Message----- From: David Booth
> Sent: Monday, May 27, 2013 11:51 AM
> To: Andy Seaborne
> Cc: public-rdf-comments@w3.org
> Subject: Re: Turtle syntax: Please align base URI with RFC 3986 & 3987
>
> Hi Andy,
>
> On 05/27/2013 06:53 AM, Andy Seaborne wrote:
>> David,
>>
>> You seem to have a different processing model to the one I think it is.
>
> Correct.   I don't really care which processing model is used.  I am
> just concerned about alignment and clarity.  Apparently I read the spec
> differently than you intended it.
>
>>   You seem to believe the base is exactly the characters used for
>> IRIREF; I understand it as URI resolution applies then the output is
>> passed to whatever is doing base URI processing to be used.
>
> Correct.
>
>>
>> For context: XML
>>
>> http://www.w3.org/TR/xmlbase/#syntax
>>
>> and the example of a relative URI "/hotpicks/" for xml:base for a
>> element.
>
> Yes, XML seems to follow HTML in this regard.
>
>> Turtle (and SPARQL) are just doing what everything else does here.
>
> But that isn't what the SPARQL spec says.  If SPARQL was intended to
> have an additional "Reference Resolution" step to transform the IRIREF
> string given in a BASE declaration into an absolute-IRI, that step is
> not written in the spec AFAICT.   It explicitly says:
> http://www.w3.org/TR/sparql11-query/#iriRefs
>
>    "Base IRIs declared with the BASE keyword must be absolute IRIs".
>
> And as I pointed out to Markus, the SPARQL spec strongly suggests a
> processing model in which the IRIREF is taken directly as the base URI,
> as SPARQL section 4.1.1.4 says:
> http://www.w3.org/TR/sparql11-query/#relIRIs
>
>    'The BASE keyword defines the Base IRI used to resolve
>    relative IRIs per RFC3986 section 5.1.1, "Base URI Embedded
>    in Content".'
>
> and RFC3986 section 5.1.1 says:
>
>    "Within certain media types, a base URI for relative references can be
>     embedded within the content itself so that it can be readily obtained
>     by a parser."
>
> But the base URI would only be "embedded within the content itself" if
> the IRIREF were taken *directly* as the base URI.
>
>>
>> What triples do you expect from, and what sequence of process steps
>> would you expect a process to take, for these Turtle documents: in each
>> case they are obtained by
>>
>> GET http://example/location/file.ttl
>>
>>
>> Document1::
>> ----
>> <s> <p> <#o> .
>> ----
>>
>> Document2::
>> ----
>> @base        <http://example/base2> .
>> <s> <p> <#o> .
>> ----
>>
>> Document3::
>> ----
>> <s> <p> <#o> .
>> @base        <http://example/base2> .
>> <s> <p> <#o> .
>> ----
>>
>> Document4::
>> ----
>> @base        <base2/> .
>> <s> <p> <#o> .
>> ----
>>
>> Document5:: corner case:
>> ----
>> @base          <base2/> .
>> @prefix  ns1:  <ns#> .
>> ns1:s <p> <#o> .
>> ----
>>
>> After resolution, before used as the base, it is absolute - all URIs in
>> RDF are absolute.
>
> Yes, but this is not an RDF question.  It is a Turtle syntax question.
> Base URIs don't exist in RDF.
>
>> This absolute URI - possible with fragment, is then
>> given to what ever machinery is doing to further URI resolution.  That
>> code is responsible for determining the right base URI given the inputs.
>>
>> Hence, I see that
>> "If the base URI is obtained from a URI reference,  ..."
>> applies.
>
> But that quote comes from the beginning of RFC3986 section 5.1, which is
> not referenced from either the Turtle or SPARQL specs.  Turtle and
> SPARQL only reference later subsections of 5.1.
>
> If SPARQL was intended to have the processing model that you suggest --
> and that would make sense, given the HTML and XML precedents -- then
> Turtle should also use that processing model, as you suggest.  In which
> case an erratum needs to be issued for SPARQL explaining the omission of
> the additional "Reference Resolution" step, and the Turtle spec should
> add some verbiage explicitly explaining this step.
>
> I would suggest adding something along the following lines to the second
> paragraph of Turtle section 6.3:
> http://www.w3.org/TR/turtle/#sec-iri-references
> [[
> The @base directive indirectly specifies a new Base IRI that overrides
> the previous Base IRI that was in effect at that point in the Turtle
> document.  The new Base IRI is determined by resolving the IRIREF given
> in the @base directive against the previous Base IRI that was in effect
> at that point in the Turtle document, using "Reference Resolution" as
> defined in RFC3986 section 5.  This means that: (a) if the IRIREF
> specified in the @base directive was a relative IRI, it will be
> converted to an absolute-IRI using the process described in RFC3986
> section 5; (b) if the given IRIREF contained a fragment component, the
> fragment component will be stripped in that process; and (c) @base
> directives can be chained, such that the Base IRI specified by one @base
> directive is used in determining the Base IRI specified in a @base
> directive that appears later in the Turtle document.
>
> Similarly, the @prefix directive indirectly associates a prefix label
> (specified in the PNAME_NS portion of the @prefix directive) with an IRI
> that is derived from the IRIREF specified in the @prefix directive, by
> resolving that IRIREF, as specified in RFC3986 section 5, against the
> Base IRI currently in effect at that point in the Turtle document.
> ]]
>
> And add comments to both the @prefix and @base syntax productions:
> [[
> [4] prefixID ::= '@prefix' PNAME_NS IRIREF '.'  /* See sec 6.3 */
> [5] base ::= '@base' IRIREF '.'                     /* See sec 6.3 */
> [5s] sparqlBase ::= "BASE" IRIREF                  /* See sec 6.3 */
> [6s] sparqlPrefix ::= "PREFIX" PNAME_NS IRIREF       /* See sec 6.3 */
> ]]
>
> Thanks,
> David
>
>>
>>      Andy
>>
>>
>> On 27/05/13 04:36, David Booth wrote:> Hi Markus,
>>  >
>>  > On 05/26/2013 06:37 PM, Markus Lanthaler wrote:
>>  >> On Sunday, May 26, 2013 7:17 PM, David Booth wrote:
>>  >>>> The syntax has
>>  >>>>
>>  >>>> @base IRIREF .
>>  >>>>
>>  >>>> and the @base is no different to other URIs - it is subject to URI
>>  >>>> resolution.
>>  >>>
>>  >>> But I don't see anything there that explicitly requires IRIREF to
>> be an
>>  >>> absolute-IRI as defined in RFC3987.  Other parts of the Turtle
>> syntax
>>  >>> (such as the @prefix production) also use the IRIREF syntax
>> production
>>  >>> without requiring it to be an absolute-IRI.  That's why it isn't
>> clear
>>  >>> that in the case of @base it must be an absolute-IRI.
>>  >>
>>  >> It can be a relative IRI as well. In that case it gets resolved
>>  >> against the
>>  >> currently active base IRI.
>>  >>
>>  >>
>>  >>>> @base <relURI> .
>>  >>>>
>>  >>>> is also legal as is
>>  >>>>
>>  >>>> @base <../sibling> .
>>  >>>>
>>  >>>> which might be occasionally useful.
>>  >>>
>>  >>> Huh?  Are you saying that @base can recursively specify the base URI
>>  >>> using a *relative* URI?  Then there would have to be a base URI
>> of the
>>  >>> @base URI?
>>  >>
>>  >> Yes, not recursively though but sequentially.
>>  >>
>>  >>
>>  >>> I'm very surprised to hear you say that a relative @base URI
>> would be
>>  >>> legal.  I don't think that should be allowed.  That seems too
>>  >>> mysterious and error prone to me.
>>  >>
>>  >> HTML allows that as well e.g.
>>  >>
>>  >>
>>  >>> That would require a relative URI specified in
>>  >>> @base to be resolved using "Reference Resolution", which is
>> specified
>>  >>> in
>>  >>> section 5 of RFC 3986.  But the result of "Reference Resolution"
>> is "a
>>  >>> string matching the <URI> syntax rule of Section 3", and the <URI>
>>  >>> production *allows* a fragment identifier.
>>  >>
>>  >> And why should that be a problem?
>>  >
>>  > Because a base URI as defined in RFC 3986 does not permit a fragment
>>  > identifier.  Therefore, if @base specified a relative URI which was
>>  > resolved using RFC3986 "Reference Resolution" then the result could
>>  > contain a fragment identifier.  Thus, a Turtle "base URI" could
>> contain
>>  > a fragment identifier, whereas an RFC 3986 "base URI" does not
>> permit a
>>  > fragment identifier.
>>  >
>>  >>
>>  >>
>>  >>> I think it would be better to align directly with SPARQL and RFC
>> 3986
>>  >>> and RFC 3987 by explicitly requiring @base to specify an
>> absolute-IRI.
>>  >>
>>  >> It is aligned with the two RFCs. There might be a case where you
>> can't
>>  >> resolve a relative @base as the document itself has no IRI but
>> that's the
>>  >> same problem as not being able to resolve relative IRIs anywhere
>> else in
>>  >> such a document.
>>  >
>>  > If it is aligned with RFC 3986 and 3987 then the alignment
>> certainly is
>>  > not very visible.  I spent quite a lot of time trying to track it
>> down,
>>  > and finally concluded that nothing in the Turtle spec requires
>> Turtle's
>>  > notion of a base URI (which AFAICT is specified using @base) to be an
>>  > absolute-IRI as defined in those RFCs.  Can you please point me to the
>>  > exact wording that requires a Turtle base URI to be an absolute-IRI?
>>  >
>>  > The Turtle EBNF certainly does not require it.
>>  >
>>  > Turtle section 6.3 has two paragraphs.  The first says:
>>  > http://www.w3.org/TR/turtle/#sec-iri-references
>>  > [[
>>  > Relative IRIs are resolved with base IRIs as per Uniform Resource
>>  > Identifier (URI): Generic Syntax [RFC3986] using only the basic
>>  > algorithm in section 5.2. Neither Syntax-Based Normalization nor
>>  > Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of
>>  > RFC3986) are performed. Characters additionally allowed in IRI
>>  > references are treated in the same way that unreserved characters are
>>  > treated in URI references, per section 6.5 of Internationalized
>> Resource
>>  > Identifiers (IRIs) [RFC3987].
>>  > ]]
>>  > That paragraph only talks about resolving relative URIs.  It does not
>>  > specify the base URI.
>>  >
>>  > The first sentence of the second paragraph says:
>>  > [[
>>  > The @base directive defines the Base IRI used to resolve relative IRIs
>>  > per RFC3986 section 5.1.1, "Base URI Embedded in Content".
>>  > ]]
>>  > and RFC3986 section 5.1.1 says: "Within certain media types, a base
>> URI
>>  > for relative references can be embedded within the content itself".
>>  > Since the Turtle directive is called "@base" (or "BASE") and the
>> Turtle
>>  > spec often uses the term "base URI", this would strongly suggest that
>>  > the @base directive is used to specify a base URI that is "embedded
>>  > within the content itself".  But if you and Andy are telling me that
>>  > @base may provide a relative URI, then the actual base URI is *not*
>>  > actually "embedded within the content itself".  Rather, it is
>>  > (recursively) determined by resolving that relative URI against some
>>  > other base URI.
>>  >
>>  > The rest of the second paragraph in Turtle section 6.3 says:
>>  > [[
>>  > Section 5.1.2, "Base URI from the Encapsulating Entity" defines how
>> the
>>  > In-Scope Base IRI may come from an encapsulating document, such as a
>>  > SOAP envelope with an xml:base directive or a mime multipart document
>>  > with a Content-Location header. The "Retrieval URI" identified in
>> 5.1.3,
>>  > Base "URI from the Retrieval URI", is the URL from which a particular
>>  > Turtle document was retrieved. If none of the above specifies the Base
>>  > URI, the default Base URI (section 5.1.4, "Default Base URI") is used.
>>  > Each @base directive sets a new In-Scope Base URI, relative to the
>>  > previous one.
>>  > ]]
>>  > Notice that it only references RFC3986 sections 5.1.2 and 5.1.3, which
>>  > only talk (vaguely) about where the base URI might come from.  Those
>>  > sections do not constrain the base URI to be an absolute-URI.  It
>> is the
>>  > beginning of RFC3986 section 5.1 that constrains a base URI to be an
>>  > absolute-URI, and that portion is *not* referenced by the Turtle spec.
>>  >
>>  > The last sentence of that second paragraph in Turtle section 6.3 does
>>  > say "Each @base directive sets a new In-Scope Base URI, relative to
>> the
>>  > previous one", and I guess that sentence is the justification for why
>>  > you and Andy are saying that @base can specify a relative URI.  But
>>  > knowing that RFC3986 requires a base URI to be an absolute-URI, I had
>>  > understood that sentence to mean "Each @base directive sets a new
>>  > In-Scope Base URI, [in relation to] to the previous one", i.e., it is
>>  > new in relation to the previous one.  I had no idea it was suggesting
>>  > that @base could specify a relative URI.
>>  >
>>  > Bottom line:
>>  >
>>  >   - This stuff is not at all clear in the current wording.
>>  >
>>  >   - If @base is permitted to specify a relative IRI then: (a) an
>>  > explanation should be added to explain how that relative IRI is
>>  > converted into an absolute-IRI (including what happens to any fragment
>>  > identifier that the relative IRI contains); and (b) Turtle will not be
>>  > aligned with SPARQL in this regard.
>>  >
>>  >   - If @base is NOT permitted to specify a relative IRI then the
>> Turtle
>>  > spec should make clear that @base must specify an absolute-IRI, in
>>  > alignment with SPARQL.
>>  >
>>  > I was not aware that HTML allowed base URIs to be relative, but, it
>>  > seems more important to align Turtle with SPARQL than with HTML.  Plus
>>  > it would also be simpler.
>>  >
>>  > David
>>  >
>>
>>
>>
>>
>>
>>
>
>
>
>
Received on Friday, 31 May 2013 01:24:37 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:29:56 UTC