W3C home > Mailing lists > Public > public-rdf-comments@w3.org > May 2013

Re: Turtle syntax: Please align base URI with RFC 3986 & 3987

From: Gavin Carothers <gavin@carothers.name>
Date: Thu, 30 May 2013 18:47:36 -0700
Message-ID: <CAPqY83z0ReS8orxjCwvEMpeVvfkKpD-LGF2-Qiu8ktQ10YsNjw@mail.gmail.com>
To: David Booth <david@dbooth.org>
Cc: Peter Occil <poccil14@gmail.com>, "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>, Andy Seaborne <andy.seaborne@epimorphics.com>
On Thu, May 30, 2013 at 6:24 PM, David Booth <david@dbooth.org> wrote:

> Hi Peter,
>
> I don't think that's correct, because RFC3986 says that an absolute-URI
> cannot have a fragment component:
> http://tools.ietf.org/html/**rfc3986#section-4.3<http://tools.ietf.org/html/rfc3986#section-4.3>
> and section 5.1 explicitly says: "A base URI must conform to the
> <absolute-URI> syntax rule (Section 4.3).".
>

Which an RDF system must make sure of, the syntax doesn't. The syntax
doesn't prevent any of the following base statements:

@base <//example.org/> . # Creates a new base with either http or https
depending on the current document base
@base <http://example.org/pointless> . # Creates the base URI
http://example.org/
@base <> . # Does nothing.
@base <b/> . # Adds b/ to the current base
@base <#> . # Still does nothing
@base <http://example.org/a/b#> . # Creates the base URI
http://example.org/a/ note the lack of a 'b'

Base URI does NOT work by simple prefixing nor by simple assignment, it
requires URL resolution. For more on exactly how interesting this can get
see http://url.spec.whatwg.org/#parsing or for fun with manipulating base
URLs in HTML see
https://www.w3.org/Bugs/Public/show_bug.cgi?id=18459#c11and the linked
test case (Turtle differs in that it does support more then
one in document base statement, which HTML and SPARQL do not). Base URLs do
NOT have to be an absolute URI in the syntax.

Cheers,
Gavin



> David
>
>
> On 05/29/2013 01:58 PM, Peter Occil wrote:
>
>> Your suggested text should be corrected as follows:
>>
>> [[
>> (b) if the previous base IRI contained a fragment component,
>> the fragment component will be replaced with the fragment
>> component that the given IRIREF has, or stripped if the given
>> IRIREF has none;
>> ]]
>>
>> That's because if the given IRIREF contains a fragment
>> component, that component will be inherited in the new
>> absolute URI.  Only the fragment component of the previous
>> base URI will be stripped.
>>
>> As I understand it, the given IRIREF represents R in section
>> 5.2.2 of RFC3986, and the previous base URI represents Base
>> in that section; correct me if I'm wrong. At the end of that
>> section you can see that the new absolute URI's fragment
>> changes to R's fragment.
>>
>> --Peter
>>
>> -----Original Message----- From: David Booth
>> Sent: Monday, May 27, 2013 11:51 AM
>> To: Andy Seaborne
>> Cc: public-rdf-comments@w3.org
>> Subject: Re: Turtle syntax: Please align base URI with RFC 3986 & 3987
>>
>> Hi Andy,
>>
>> On 05/27/2013 06:53 AM, Andy Seaborne wrote:
>>
>>> David,
>>>
>>> You seem to have a different processing model to the one I think it is.
>>>
>>
>> Correct.   I don't really care which processing model is used.  I am
>> just concerned about alignment and clarity.  Apparently I read the spec
>> differently than you intended it.
>>
>>    You seem to believe the base is exactly the characters used for
>>> IRIREF; I understand it as URI resolution applies then the output is
>>> passed to whatever is doing base URI processing to be used.
>>>
>>
>> Correct.
>>
>>
>>> For context: XML
>>>
>>> http://www.w3.org/TR/xmlbase/#**syntax<http://www.w3.org/TR/xmlbase/#syntax>
>>>
>>> and the example of a relative URI "/hotpicks/" for xml:base for a
>>> element.
>>>
>>
>> Yes, XML seems to follow HTML in this regard.
>>
>>  Turtle (and SPARQL) are just doing what everything else does here.
>>>
>>
>> But that isn't what the SPARQL spec says.  If SPARQL was intended to
>> have an additional "Reference Resolution" step to transform the IRIREF
>> string given in a BASE declaration into an absolute-IRI, that step is
>> not written in the spec AFAICT.   It explicitly says:
>> http://www.w3.org/TR/sparql11-**query/#iriRefs<http://www.w3.org/TR/sparql11-query/#iriRefs>
>>
>>    "Base IRIs declared with the BASE keyword must be absolute IRIs".
>>
>> And as I pointed out to Markus, the SPARQL spec strongly suggests a
>> processing model in which the IRIREF is taken directly as the base URI,
>> as SPARQL section 4.1.1.4 says:
>> http://www.w3.org/TR/sparql11-**query/#relIRIs<http://www.w3.org/TR/sparql11-query/#relIRIs>
>>
>>    'The BASE keyword defines the Base IRI used to resolve
>>    relative IRIs per RFC3986 section 5.1.1, "Base URI Embedded
>>    in Content".'
>>
>> and RFC3986 section 5.1.1 says:
>>
>>    "Within certain media types, a base URI for relative references can be
>>     embedded within the content itself so that it can be readily obtained
>>     by a parser."
>>
>> But the base URI would only be "embedded within the content itself" if
>> the IRIREF were taken *directly* as the base URI.
>>
>>
>>> What triples do you expect from, and what sequence of process steps
>>> would you expect a process to take, for these Turtle documents: in each
>>> case they are obtained by
>>>
>>> GET http://example/location/file.**ttl<http://example/location/file.ttl>
>>>
>>>
>>> Document1::
>>> ----
>>> <s> <p> <#o> .
>>> ----
>>>
>>> Document2::
>>> ----
>>> @base        <http://example/base2> .
>>> <s> <p> <#o> .
>>> ----
>>>
>>> Document3::
>>> ----
>>> <s> <p> <#o> .
>>> @base        <http://example/base2> .
>>> <s> <p> <#o> .
>>> ----
>>>
>>> Document4::
>>> ----
>>> @base        <base2/> .
>>> <s> <p> <#o> .
>>> ----
>>>
>>> Document5:: corner case:
>>> ----
>>> @base          <base2/> .
>>> @prefix  ns1:  <ns#> .
>>> ns1:s <p> <#o> .
>>> ----
>>>
>>> After resolution, before used as the base, it is absolute - all URIs in
>>> RDF are absolute.
>>>
>>
>> Yes, but this is not an RDF question.  It is a Turtle syntax question.
>> Base URIs don't exist in RDF.
>>
>>  This absolute URI - possible with fragment, is then
>>> given to what ever machinery is doing to further URI resolution.  That
>>> code is responsible for determining the right base URI given the inputs.
>>>
>>> Hence, I see that
>>> "If the base URI is obtained from a URI reference,  ..."
>>> applies.
>>>
>>
>> But that quote comes from the beginning of RFC3986 section 5.1, which is
>> not referenced from either the Turtle or SPARQL specs.  Turtle and
>> SPARQL only reference later subsections of 5.1.
>>
>> If SPARQL was intended to have the processing model that you suggest --
>> and that would make sense, given the HTML and XML precedents -- then
>> Turtle should also use that processing model, as you suggest.  In which
>> case an erratum needs to be issued for SPARQL explaining the omission of
>> the additional "Reference Resolution" step, and the Turtle spec should
>> add some verbiage explicitly explaining this step.
>>
>> I would suggest adding something along the following lines to the second
>> paragraph of Turtle section 6.3:
>> http://www.w3.org/TR/turtle/#**sec-iri-references<http://www.w3.org/TR/turtle/#sec-iri-references>
>> [[
>> The @base directive indirectly specifies a new Base IRI that overrides
>> the previous Base IRI that was in effect at that point in the Turtle
>> document.  The new Base IRI is determined by resolving the IRIREF given
>> in the @base directive against the previous Base IRI that was in effect
>> at that point in the Turtle document, using "Reference Resolution" as
>> defined in RFC3986 section 5.  This means that: (a) if the IRIREF
>> specified in the @base directive was a relative IRI, it will be
>> converted to an absolute-IRI using the process described in RFC3986
>> section 5; (b) if the given IRIREF contained a fragment component, the
>> fragment component will be stripped in that process; and (c) @base
>> directives can be chained, such that the Base IRI specified by one @base
>> directive is used in determining the Base IRI specified in a @base
>> directive that appears later in the Turtle document.
>>
>> Similarly, the @prefix directive indirectly associates a prefix label
>> (specified in the PNAME_NS portion of the @prefix directive) with an IRI
>> that is derived from the IRIREF specified in the @prefix directive, by
>> resolving that IRIREF, as specified in RFC3986 section 5, against the
>> Base IRI currently in effect at that point in the Turtle document.
>> ]]
>>
>> And add comments to both the @prefix and @base syntax productions:
>> [[
>> [4] prefixID ::= '@prefix' PNAME_NS IRIREF '.'  /* See sec 6.3 */
>> [5] base ::= '@base' IRIREF '.'                     /* See sec 6.3 */
>> [5s] sparqlBase ::= "BASE" IRIREF                  /* See sec 6.3 */
>> [6s] sparqlPrefix ::= "PREFIX" PNAME_NS IRIREF       /* See sec 6.3 */
>> ]]
>>
>> Thanks,
>> David
>>
>>
>>>      Andy
>>>
>>>
>>> On 27/05/13 04:36, David Booth wrote:> Hi Markus,
>>>  >
>>>  > On 05/26/2013 06:37 PM, Markus Lanthaler wrote:
>>>  >> On Sunday, May 26, 2013 7:17 PM, David Booth wrote:
>>>  >>>> The syntax has
>>>  >>>>
>>>  >>>> @base IRIREF .
>>>  >>>>
>>>  >>>> and the @base is no different to other URIs - it is subject to URI
>>>  >>>> resolution.
>>>  >>>
>>>  >>> But I don't see anything there that explicitly requires IRIREF to
>>> be an
>>>  >>> absolute-IRI as defined in RFC3987.  Other parts of the Turtle
>>> syntax
>>>  >>> (such as the @prefix production) also use the IRIREF syntax
>>> production
>>>  >>> without requiring it to be an absolute-IRI.  That's why it isn't
>>> clear
>>>  >>> that in the case of @base it must be an absolute-IRI.
>>>  >>
>>>  >> It can be a relative IRI as well. In that case it gets resolved
>>>  >> against the
>>>  >> currently active base IRI.
>>>  >>
>>>  >>
>>>  >>>> @base <relURI> .
>>>  >>>>
>>>  >>>> is also legal as is
>>>  >>>>
>>>  >>>> @base <../sibling> .
>>>  >>>>
>>>  >>>> which might be occasionally useful.
>>>  >>>
>>>  >>> Huh?  Are you saying that @base can recursively specify the base URI
>>>  >>> using a *relative* URI?  Then there would have to be a base URI
>>> of the
>>>  >>> @base URI?
>>>  >>
>>>  >> Yes, not recursively though but sequentially.
>>>  >>
>>>  >>
>>>  >>> I'm very surprised to hear you say that a relative @base URI
>>> would be
>>>  >>> legal.  I don't think that should be allowed.  That seems too
>>>  >>> mysterious and error prone to me.
>>>  >>
>>>  >> HTML allows that as well e.g.
>>>  >>
>>>  >>
>>>  >>> That would require a relative URI specified in
>>>  >>> @base to be resolved using "Reference Resolution", which is
>>> specified
>>>  >>> in
>>>  >>> section 5 of RFC 3986.  But the result of "Reference Resolution"
>>> is "a
>>>  >>> string matching the <URI> syntax rule of Section 3", and the <URI>
>>>  >>> production *allows* a fragment identifier.
>>>  >>
>>>  >> And why should that be a problem?
>>>  >
>>>  > Because a base URI as defined in RFC 3986 does not permit a fragment
>>>  > identifier.  Therefore, if @base specified a relative URI which was
>>>  > resolved using RFC3986 "Reference Resolution" then the result could
>>>  > contain a fragment identifier.  Thus, a Turtle "base URI" could
>>> contain
>>>  > a fragment identifier, whereas an RFC 3986 "base URI" does not
>>> permit a
>>>  > fragment identifier.
>>>  >
>>>  >>
>>>  >>
>>>  >>> I think it would be better to align directly with SPARQL and RFC
>>> 3986
>>>  >>> and RFC 3987 by explicitly requiring @base to specify an
>>> absolute-IRI.
>>>  >>
>>>  >> It is aligned with the two RFCs. There might be a case where you
>>> can't
>>>  >> resolve a relative @base as the document itself has no IRI but
>>> that's the
>>>  >> same problem as not being able to resolve relative IRIs anywhere
>>> else in
>>>  >> such a document.
>>>  >
>>>  > If it is aligned with RFC 3986 and 3987 then the alignment
>>> certainly is
>>>  > not very visible.  I spent quite a lot of time trying to track it
>>> down,
>>>  > and finally concluded that nothing in the Turtle spec requires
>>> Turtle's
>>>  > notion of a base URI (which AFAICT is specified using @base) to be an
>>>  > absolute-IRI as defined in those RFCs.  Can you please point me to the
>>>  > exact wording that requires a Turtle base URI to be an absolute-IRI?
>>>  >
>>>  > The Turtle EBNF certainly does not require it.
>>>  >
>>>  > Turtle section 6.3 has two paragraphs.  The first says:
>>>  > http://www.w3.org/TR/turtle/#**sec-iri-references<http://www.w3.org/TR/turtle/#sec-iri-references>
>>>  > [[
>>>  > Relative IRIs are resolved with base IRIs as per Uniform Resource
>>>  > Identifier (URI): Generic Syntax [RFC3986] using only the basic
>>>  > algorithm in section 5.2. Neither Syntax-Based Normalization nor
>>>  > Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of
>>>  > RFC3986) are performed. Characters additionally allowed in IRI
>>>  > references are treated in the same way that unreserved characters are
>>>  > treated in URI references, per section 6.5 of Internationalized
>>> Resource
>>>  > Identifiers (IRIs) [RFC3987].
>>>  > ]]
>>>  > That paragraph only talks about resolving relative URIs.  It does not
>>>  > specify the base URI.
>>>  >
>>>  > The first sentence of the second paragraph says:
>>>  > [[
>>>  > The @base directive defines the Base IRI used to resolve relative IRIs
>>>  > per RFC3986 section 5.1.1, "Base URI Embedded in Content".
>>>  > ]]
>>>  > and RFC3986 section 5.1.1 says: "Within certain media types, a base
>>> URI
>>>  > for relative references can be embedded within the content itself".
>>>  > Since the Turtle directive is called "@base" (or "BASE") and the
>>> Turtle
>>>  > spec often uses the term "base URI", this would strongly suggest that
>>>  > the @base directive is used to specify a base URI that is "embedded
>>>  > within the content itself".  But if you and Andy are telling me that
>>>  > @base may provide a relative URI, then the actual base URI is *not*
>>>  > actually "embedded within the content itself".  Rather, it is
>>>  > (recursively) determined by resolving that relative URI against some
>>>  > other base URI.
>>>  >
>>>  > The rest of the second paragraph in Turtle section 6.3 says:
>>>  > [[
>>>  > Section 5.1.2, "Base URI from the Encapsulating Entity" defines how
>>> the
>>>  > In-Scope Base IRI may come from an encapsulating document, such as a
>>>  > SOAP envelope with an xml:base directive or a mime multipart document
>>>  > with a Content-Location header. The "Retrieval URI" identified in
>>> 5.1.3,
>>>  > Base "URI from the Retrieval URI", is the URL from which a particular
>>>  > Turtle document was retrieved. If none of the above specifies the Base
>>>  > URI, the default Base URI (section 5.1.4, "Default Base URI") is used.
>>>  > Each @base directive sets a new In-Scope Base URI, relative to the
>>>  > previous one.
>>>  > ]]
>>>  > Notice that it only references RFC3986 sections 5.1.2 and 5.1.3, which
>>>  > only talk (vaguely) about where the base URI might come from.  Those
>>>  > sections do not constrain the base URI to be an absolute-URI.  It
>>> is the
>>>  > beginning of RFC3986 section 5.1 that constrains a base URI to be an
>>>  > absolute-URI, and that portion is *not* referenced by the Turtle spec.
>>>  >
>>>  > The last sentence of that second paragraph in Turtle section 6.3 does
>>>  > say "Each @base directive sets a new In-Scope Base URI, relative to
>>> the
>>>  > previous one", and I guess that sentence is the justification for why
>>>  > you and Andy are saying that @base can specify a relative URI.  But
>>>  > knowing that RFC3986 requires a base URI to be an absolute-URI, I had
>>>  > understood that sentence to mean "Each @base directive sets a new
>>>  > In-Scope Base URI, [in relation to] to the previous one", i.e., it is
>>>  > new in relation to the previous one.  I had no idea it was suggesting
>>>  > that @base could specify a relative URI.
>>>  >
>>>  > Bottom line:
>>>  >
>>>  >   - This stuff is not at all clear in the current wording.
>>>  >
>>>  >   - If @base is permitted to specify a relative IRI then: (a) an
>>>  > explanation should be added to explain how that relative IRI is
>>>  > converted into an absolute-IRI (including what happens to any fragment
>>>  > identifier that the relative IRI contains); and (b) Turtle will not be
>>>  > aligned with SPARQL in this regard.
>>>  >
>>>  >   - If @base is NOT permitted to specify a relative IRI then the
>>> Turtle
>>>  > spec should make clear that @base must specify an absolute-IRI, in
>>>  > alignment with SPARQL.
>>>  >
>>>  > I was not aware that HTML allowed base URIs to be relative, but, it
>>>  > seems more important to align Turtle with SPARQL than with HTML.  Plus
>>>  > it would also be simpler.
>>>  >
>>>  > David
>>>  >
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
Received on Friday, 31 May 2013 01:48:05 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:29:56 UTC