Re: Turtle syntax: Please align base URI with RFC 3986 & 3987

Hi Andy,

On 05/27/2013 06:53 AM, Andy Seaborne wrote:
> David,
>
> You seem to have a different processing model to the one I think it is.

Correct.   I don't really care which processing model is used.  I am 
just concerned about alignment and clarity.  Apparently I read the spec 
differently than you intended it.

>   You seem to believe the base is exactly the characters used for
> IRIREF; I understand it as URI resolution applies then the output is
> passed to whatever is doing base URI processing to be used.

Correct.

>
> For context: XML
>
> http://www.w3.org/TR/xmlbase/#syntax
>
> and the example of a relative URI "/hotpicks/" for xml:base for a
> element.

Yes, XML seems to follow HTML in this regard.

> Turtle (and SPARQL) are just doing what everything else does here.

But that isn't what the SPARQL spec says.  If SPARQL was intended to 
have an additional "Reference Resolution" step to transform the IRIREF 
string given in a BASE declaration into an absolute-IRI, that step is 
not written in the spec AFAICT.   It explicitly says:
http://www.w3.org/TR/sparql11-query/#iriRefs

   "Base IRIs declared with the BASE keyword must be absolute IRIs".

And as I pointed out to Markus, the SPARQL spec strongly suggests a 
processing model in which the IRIREF is taken directly as the base URI, 
as SPARQL section 4.1.1.4 says:
http://www.w3.org/TR/sparql11-query/#relIRIs

   'The BASE keyword defines the Base IRI used to resolve
   relative IRIs per RFC3986 section 5.1.1, "Base URI Embedded
   in Content".'

and RFC3986 section 5.1.1 says:

   "Within certain media types, a base URI for relative references can be
    embedded within the content itself so that it can be readily obtained
    by a parser."

But the base URI would only be "embedded within the content itself" if 
the IRIREF were taken *directly* as the base URI.

>
> What triples do you expect from, and what sequence of process steps
> would you expect a process to take, for these Turtle documents: in each
> case they are obtained by
>
> GET http://example/location/file.ttl
>
>
> Document1::
> ----
> <s> <p> <#o> .
> ----
>
> Document2::
> ----
> @base        <http://example/base2> .
> <s> <p> <#o> .
> ----
>
> Document3::
> ----
> <s> <p> <#o> .
> @base        <http://example/base2> .
> <s> <p> <#o> .
> ----
>
> Document4::
> ----
> @base        <base2/> .
> <s> <p> <#o> .
> ----
>
> Document5:: corner case:
> ----
> @base          <base2/> .
> @prefix  ns1:  <ns#> .
> ns1:s <p> <#o> .
> ----
>
> After resolution, before used as the base, it is absolute - all URIs in
> RDF are absolute.

Yes, but this is not an RDF question.  It is a Turtle syntax question. 
Base URIs don't exist in RDF.

> This absolute URI - possible with fragment, is then
> given to what ever machinery is doing to further URI resolution.  That
> code is responsible for determining the right base URI given the inputs.
>
> Hence, I see that
> "If the base URI is obtained from a URI reference,  ..."
> applies.

But that quote comes from the beginning of RFC3986 section 5.1, which is 
not referenced from either the Turtle or SPARQL specs.  Turtle and 
SPARQL only reference later subsections of 5.1.

If SPARQL was intended to have the processing model that you suggest -- 
and that would make sense, given the HTML and XML precedents -- then 
Turtle should also use that processing model, as you suggest.  In which 
case an erratum needs to be issued for SPARQL explaining the omission of 
the additional "Reference Resolution" step, and the Turtle spec should 
add some verbiage explicitly explaining this step.

I would suggest adding something along the following lines to the second 
paragraph of Turtle section 6.3:
http://www.w3.org/TR/turtle/#sec-iri-references
[[
The @base directive indirectly specifies a new Base IRI that overrides 
the previous Base IRI that was in effect at that point in the Turtle 
document.  The new Base IRI is determined by resolving the IRIREF given 
in the @base directive against the previous Base IRI that was in effect 
at that point in the Turtle document, using "Reference Resolution" as 
defined in RFC3986 section 5.  This means that: (a) if the IRIREF 
specified in the @base directive was a relative IRI, it will be 
converted to an absolute-IRI using the process described in RFC3986 
section 5; (b) if the given IRIREF contained a fragment component, the 
fragment component will be stripped in that process; and (c) @base 
directives can be chained, such that the Base IRI specified by one @base 
directive is used in determining the Base IRI specified in a @base 
directive that appears later in the Turtle document.

Similarly, the @prefix directive indirectly associates a prefix label 
(specified in the PNAME_NS portion of the @prefix directive) with an IRI 
that is derived from the IRIREF specified in the @prefix directive, by 
resolving that IRIREF, as specified in RFC3986 section 5, against the 
Base IRI currently in effect at that point in the Turtle document.
]]

And add comments to both the @prefix and @base syntax productions:
[[
[4] 	prefixID 	::= 	'@prefix' PNAME_NS IRIREF '.'  /* See sec 6.3 */
[5] 	base 	::= 	'@base' IRIREF '.'                     /* See sec 6.3 */
[5s] 	sparqlBase 	::= 	"BASE" IRIREF                  /* See sec 6.3 */
[6s] 	sparqlPrefix 	::= 	"PREFIX" PNAME_NS IRIREF       /* See sec 6.3 */
]]

Thanks,
David

>
>      Andy
>
>
> On 27/05/13 04:36, David Booth wrote:> Hi Markus,
>  >
>  > On 05/26/2013 06:37 PM, Markus Lanthaler wrote:
>  >> On Sunday, May 26, 2013 7:17 PM, David Booth wrote:
>  >>>> The syntax has
>  >>>>
>  >>>> @base IRIREF .
>  >>>>
>  >>>> and the @base is no different to other URIs - it is subject to URI
>  >>>> resolution.
>  >>>
>  >>> But I don't see anything there that explicitly requires IRIREF to
> be an
>  >>> absolute-IRI as defined in RFC3987.  Other parts of the Turtle syntax
>  >>> (such as the @prefix production) also use the IRIREF syntax production
>  >>> without requiring it to be an absolute-IRI.  That's why it isn't clear
>  >>> that in the case of @base it must be an absolute-IRI.
>  >>
>  >> It can be a relative IRI as well. In that case it gets resolved
>  >> against the
>  >> currently active base IRI.
>  >>
>  >>
>  >>>> @base <relURI> .
>  >>>>
>  >>>> is also legal as is
>  >>>>
>  >>>> @base <../sibling> .
>  >>>>
>  >>>> which might be occasionally useful.
>  >>>
>  >>> Huh?  Are you saying that @base can recursively specify the base URI
>  >>> using a *relative* URI?  Then there would have to be a base URI of the
>  >>> @base URI?
>  >>
>  >> Yes, not recursively though but sequentially.
>  >>
>  >>
>  >>> I'm very surprised to hear you say that a relative @base URI would be
>  >>> legal.  I don't think that should be allowed.  That seems too
>  >>> mysterious and error prone to me.
>  >>
>  >> HTML allows that as well e.g.
>  >>
>  >>
>  >>> That would require a relative URI specified in
>  >>> @base to be resolved using "Reference Resolution", which is specified
>  >>> in
>  >>> section 5 of RFC 3986.  But the result of "Reference Resolution" is "a
>  >>> string matching the <URI> syntax rule of Section 3", and the <URI>
>  >>> production *allows* a fragment identifier.
>  >>
>  >> And why should that be a problem?
>  >
>  > Because a base URI as defined in RFC 3986 does not permit a fragment
>  > identifier.  Therefore, if @base specified a relative URI which was
>  > resolved using RFC3986 "Reference Resolution" then the result could
>  > contain a fragment identifier.  Thus, a Turtle "base URI" could contain
>  > a fragment identifier, whereas an RFC 3986 "base URI" does not permit a
>  > fragment identifier.
>  >
>  >>
>  >>
>  >>> I think it would be better to align directly with SPARQL and RFC 3986
>  >>> and RFC 3987 by explicitly requiring @base to specify an absolute-IRI.
>  >>
>  >> It is aligned with the two RFCs. There might be a case where you can't
>  >> resolve a relative @base as the document itself has no IRI but
> that's the
>  >> same problem as not being able to resolve relative IRIs anywhere
> else in
>  >> such a document.
>  >
>  > If it is aligned with RFC 3986 and 3987 then the alignment certainly is
>  > not very visible.  I spent quite a lot of time trying to track it down,
>  > and finally concluded that nothing in the Turtle spec requires Turtle's
>  > notion of a base URI (which AFAICT is specified using @base) to be an
>  > absolute-IRI as defined in those RFCs.  Can you please point me to the
>  > exact wording that requires a Turtle base URI to be an absolute-IRI?
>  >
>  > The Turtle EBNF certainly does not require it.
>  >
>  > Turtle section 6.3 has two paragraphs.  The first says:
>  > http://www.w3.org/TR/turtle/#sec-iri-references
>  > [[
>  > Relative IRIs are resolved with base IRIs as per Uniform Resource
>  > Identifier (URI): Generic Syntax [RFC3986] using only the basic
>  > algorithm in section 5.2. Neither Syntax-Based Normalization nor
>  > Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of
>  > RFC3986) are performed. Characters additionally allowed in IRI
>  > references are treated in the same way that unreserved characters are
>  > treated in URI references, per section 6.5 of Internationalized Resource
>  > Identifiers (IRIs) [RFC3987].
>  > ]]
>  > That paragraph only talks about resolving relative URIs.  It does not
>  > specify the base URI.
>  >
>  > The first sentence of the second paragraph says:
>  > [[
>  > The @base directive defines the Base IRI used to resolve relative IRIs
>  > per RFC3986 section 5.1.1, "Base URI Embedded in Content".
>  > ]]
>  > and RFC3986 section 5.1.1 says: "Within certain media types, a base URI
>  > for relative references can be embedded within the content itself".
>  > Since the Turtle directive is called "@base" (or "BASE") and the Turtle
>  > spec often uses the term "base URI", this would strongly suggest that
>  > the @base directive is used to specify a base URI that is "embedded
>  > within the content itself".  But if you and Andy are telling me that
>  > @base may provide a relative URI, then the actual base URI is *not*
>  > actually "embedded within the content itself".  Rather, it is
>  > (recursively) determined by resolving that relative URI against some
>  > other base URI.
>  >
>  > The rest of the second paragraph in Turtle section 6.3 says:
>  > [[
>  > Section 5.1.2, "Base URI from the Encapsulating Entity" defines how the
>  > In-Scope Base IRI may come from an encapsulating document, such as a
>  > SOAP envelope with an xml:base directive or a mime multipart document
>  > with a Content-Location header. The "Retrieval URI" identified in 5.1.3,
>  > Base "URI from the Retrieval URI", is the URL from which a particular
>  > Turtle document was retrieved. If none of the above specifies the Base
>  > URI, the default Base URI (section 5.1.4, "Default Base URI") is used.
>  > Each @base directive sets a new In-Scope Base URI, relative to the
>  > previous one.
>  > ]]
>  > Notice that it only references RFC3986 sections 5.1.2 and 5.1.3, which
>  > only talk (vaguely) about where the base URI might come from.  Those
>  > sections do not constrain the base URI to be an absolute-URI.  It is the
>  > beginning of RFC3986 section 5.1 that constrains a base URI to be an
>  > absolute-URI, and that portion is *not* referenced by the Turtle spec.
>  >
>  > The last sentence of that second paragraph in Turtle section 6.3 does
>  > say "Each @base directive sets a new In-Scope Base URI, relative to the
>  > previous one", and I guess that sentence is the justification for why
>  > you and Andy are saying that @base can specify a relative URI.  But
>  > knowing that RFC3986 requires a base URI to be an absolute-URI, I had
>  > understood that sentence to mean "Each @base directive sets a new
>  > In-Scope Base URI, [in relation to] to the previous one", i.e., it is
>  > new in relation to the previous one.  I had no idea it was suggesting
>  > that @base could specify a relative URI.
>  >
>  > Bottom line:
>  >
>  >   - This stuff is not at all clear in the current wording.
>  >
>  >   - If @base is permitted to specify a relative IRI then: (a) an
>  > explanation should be added to explain how that relative IRI is
>  > converted into an absolute-IRI (including what happens to any fragment
>  > identifier that the relative IRI contains); and (b) Turtle will not be
>  > aligned with SPARQL in this regard.
>  >
>  >   - If @base is NOT permitted to specify a relative IRI then the Turtle
>  > spec should make clear that @base must specify an absolute-IRI, in
>  > alignment with SPARQL.
>  >
>  > I was not aware that HTML allowed base URIs to be relative, but, it
>  > seems more important to align Turtle with SPARQL than with HTML.  Plus
>  > it would also be simpler.
>  >
>  > David
>  >
>
>
>
>
>
>

Received on Monday, 27 May 2013 15:51:46 UTC