- From: David Booth <david@dbooth.org>
- Date: Thu, 30 May 2013 21:24:08 -0400
- To: Peter Occil <poccil14@gmail.com>
- CC: public-rdf-comments@w3.org, Andy Seaborne <andy.seaborne@epimorphics.com>
Hi Peter, I don't think that's correct, because RFC3986 says that an absolute-URI cannot have a fragment component: http://tools.ietf.org/html/rfc3986#section-4.3 and section 5.1 explicitly says: "A base URI must conform to the <absolute-URI> syntax rule (Section 4.3).". David On 05/29/2013 01:58 PM, Peter Occil wrote: > Your suggested text should be corrected as follows: > > [[ > (b) if the previous base IRI contained a fragment component, > the fragment component will be replaced with the fragment > component that the given IRIREF has, or stripped if the given > IRIREF has none; > ]] > > That's because if the given IRIREF contains a fragment > component, that component will be inherited in the new > absolute URI. Only the fragment component of the previous > base URI will be stripped. > > As I understand it, the given IRIREF represents R in section > 5.2.2 of RFC3986, and the previous base URI represents Base > in that section; correct me if I'm wrong. At the end of that > section you can see that the new absolute URI's fragment > changes to R's fragment. > > --Peter > > -----Original Message----- From: David Booth > Sent: Monday, May 27, 2013 11:51 AM > To: Andy Seaborne > Cc: public-rdf-comments@w3.org > Subject: Re: Turtle syntax: Please align base URI with RFC 3986 & 3987 > > Hi Andy, > > On 05/27/2013 06:53 AM, Andy Seaborne wrote: >> David, >> >> You seem to have a different processing model to the one I think it is. > > Correct. I don't really care which processing model is used. I am > just concerned about alignment and clarity. Apparently I read the spec > differently than you intended it. > >> You seem to believe the base is exactly the characters used for >> IRIREF; I understand it as URI resolution applies then the output is >> passed to whatever is doing base URI processing to be used. > > Correct. > >> >> For context: XML >> >> http://www.w3.org/TR/xmlbase/#syntax >> >> and the example of a relative URI "/hotpicks/" for xml:base for a >> element. > > Yes, XML seems to follow HTML in this regard. > >> Turtle (and SPARQL) are just doing what everything else does here. > > But that isn't what the SPARQL spec says. If SPARQL was intended to > have an additional "Reference Resolution" step to transform the IRIREF > string given in a BASE declaration into an absolute-IRI, that step is > not written in the spec AFAICT. It explicitly says: > http://www.w3.org/TR/sparql11-query/#iriRefs > > "Base IRIs declared with the BASE keyword must be absolute IRIs". > > And as I pointed out to Markus, the SPARQL spec strongly suggests a > processing model in which the IRIREF is taken directly as the base URI, > as SPARQL section 4.1.1.4 says: > http://www.w3.org/TR/sparql11-query/#relIRIs > > 'The BASE keyword defines the Base IRI used to resolve > relative IRIs per RFC3986 section 5.1.1, "Base URI Embedded > in Content".' > > and RFC3986 section 5.1.1 says: > > "Within certain media types, a base URI for relative references can be > embedded within the content itself so that it can be readily obtained > by a parser." > > But the base URI would only be "embedded within the content itself" if > the IRIREF were taken *directly* as the base URI. > >> >> What triples do you expect from, and what sequence of process steps >> would you expect a process to take, for these Turtle documents: in each >> case they are obtained by >> >> GET http://example/location/file.ttl >> >> >> Document1:: >> ---- >> <s> <p> <#o> . >> ---- >> >> Document2:: >> ---- >> @base <http://example/base2> . >> <s> <p> <#o> . >> ---- >> >> Document3:: >> ---- >> <s> <p> <#o> . >> @base <http://example/base2> . >> <s> <p> <#o> . >> ---- >> >> Document4:: >> ---- >> @base <base2/> . >> <s> <p> <#o> . >> ---- >> >> Document5:: corner case: >> ---- >> @base <base2/> . >> @prefix ns1: <ns#> . >> ns1:s <p> <#o> . >> ---- >> >> After resolution, before used as the base, it is absolute - all URIs in >> RDF are absolute. > > Yes, but this is not an RDF question. It is a Turtle syntax question. > Base URIs don't exist in RDF. > >> This absolute URI - possible with fragment, is then >> given to what ever machinery is doing to further URI resolution. That >> code is responsible for determining the right base URI given the inputs. >> >> Hence, I see that >> "If the base URI is obtained from a URI reference, ..." >> applies. > > But that quote comes from the beginning of RFC3986 section 5.1, which is > not referenced from either the Turtle or SPARQL specs. Turtle and > SPARQL only reference later subsections of 5.1. > > If SPARQL was intended to have the processing model that you suggest -- > and that would make sense, given the HTML and XML precedents -- then > Turtle should also use that processing model, as you suggest. In which > case an erratum needs to be issued for SPARQL explaining the omission of > the additional "Reference Resolution" step, and the Turtle spec should > add some verbiage explicitly explaining this step. > > I would suggest adding something along the following lines to the second > paragraph of Turtle section 6.3: > http://www.w3.org/TR/turtle/#sec-iri-references > [[ > The @base directive indirectly specifies a new Base IRI that overrides > the previous Base IRI that was in effect at that point in the Turtle > document. The new Base IRI is determined by resolving the IRIREF given > in the @base directive against the previous Base IRI that was in effect > at that point in the Turtle document, using "Reference Resolution" as > defined in RFC3986 section 5. This means that: (a) if the IRIREF > specified in the @base directive was a relative IRI, it will be > converted to an absolute-IRI using the process described in RFC3986 > section 5; (b) if the given IRIREF contained a fragment component, the > fragment component will be stripped in that process; and (c) @base > directives can be chained, such that the Base IRI specified by one @base > directive is used in determining the Base IRI specified in a @base > directive that appears later in the Turtle document. > > Similarly, the @prefix directive indirectly associates a prefix label > (specified in the PNAME_NS portion of the @prefix directive) with an IRI > that is derived from the IRIREF specified in the @prefix directive, by > resolving that IRIREF, as specified in RFC3986 section 5, against the > Base IRI currently in effect at that point in the Turtle document. > ]] > > And add comments to both the @prefix and @base syntax productions: > [[ > [4] prefixID ::= '@prefix' PNAME_NS IRIREF '.' /* See sec 6.3 */ > [5] base ::= '@base' IRIREF '.' /* See sec 6.3 */ > [5s] sparqlBase ::= "BASE" IRIREF /* See sec 6.3 */ > [6s] sparqlPrefix ::= "PREFIX" PNAME_NS IRIREF /* See sec 6.3 */ > ]] > > Thanks, > David > >> >> Andy >> >> >> On 27/05/13 04:36, David Booth wrote:> Hi Markus, >> > >> > On 05/26/2013 06:37 PM, Markus Lanthaler wrote: >> >> On Sunday, May 26, 2013 7:17 PM, David Booth wrote: >> >>>> The syntax has >> >>>> >> >>>> @base IRIREF . >> >>>> >> >>>> and the @base is no different to other URIs - it is subject to URI >> >>>> resolution. >> >>> >> >>> But I don't see anything there that explicitly requires IRIREF to >> be an >> >>> absolute-IRI as defined in RFC3987. Other parts of the Turtle >> syntax >> >>> (such as the @prefix production) also use the IRIREF syntax >> production >> >>> without requiring it to be an absolute-IRI. That's why it isn't >> clear >> >>> that in the case of @base it must be an absolute-IRI. >> >> >> >> It can be a relative IRI as well. In that case it gets resolved >> >> against the >> >> currently active base IRI. >> >> >> >> >> >>>> @base <relURI> . >> >>>> >> >>>> is also legal as is >> >>>> >> >>>> @base <../sibling> . >> >>>> >> >>>> which might be occasionally useful. >> >>> >> >>> Huh? Are you saying that @base can recursively specify the base URI >> >>> using a *relative* URI? Then there would have to be a base URI >> of the >> >>> @base URI? >> >> >> >> Yes, not recursively though but sequentially. >> >> >> >> >> >>> I'm very surprised to hear you say that a relative @base URI >> would be >> >>> legal. I don't think that should be allowed. That seems too >> >>> mysterious and error prone to me. >> >> >> >> HTML allows that as well e.g. >> >> >> >> >> >>> That would require a relative URI specified in >> >>> @base to be resolved using "Reference Resolution", which is >> specified >> >>> in >> >>> section 5 of RFC 3986. But the result of "Reference Resolution" >> is "a >> >>> string matching the <URI> syntax rule of Section 3", and the <URI> >> >>> production *allows* a fragment identifier. >> >> >> >> And why should that be a problem? >> > >> > Because a base URI as defined in RFC 3986 does not permit a fragment >> > identifier. Therefore, if @base specified a relative URI which was >> > resolved using RFC3986 "Reference Resolution" then the result could >> > contain a fragment identifier. Thus, a Turtle "base URI" could >> contain >> > a fragment identifier, whereas an RFC 3986 "base URI" does not >> permit a >> > fragment identifier. >> > >> >> >> >> >> >>> I think it would be better to align directly with SPARQL and RFC >> 3986 >> >>> and RFC 3987 by explicitly requiring @base to specify an >> absolute-IRI. >> >> >> >> It is aligned with the two RFCs. There might be a case where you >> can't >> >> resolve a relative @base as the document itself has no IRI but >> that's the >> >> same problem as not being able to resolve relative IRIs anywhere >> else in >> >> such a document. >> > >> > If it is aligned with RFC 3986 and 3987 then the alignment >> certainly is >> > not very visible. I spent quite a lot of time trying to track it >> down, >> > and finally concluded that nothing in the Turtle spec requires >> Turtle's >> > notion of a base URI (which AFAICT is specified using @base) to be an >> > absolute-IRI as defined in those RFCs. Can you please point me to the >> > exact wording that requires a Turtle base URI to be an absolute-IRI? >> > >> > The Turtle EBNF certainly does not require it. >> > >> > Turtle section 6.3 has two paragraphs. The first says: >> > http://www.w3.org/TR/turtle/#sec-iri-references >> > [[ >> > Relative IRIs are resolved with base IRIs as per Uniform Resource >> > Identifier (URI): Generic Syntax [RFC3986] using only the basic >> > algorithm in section 5.2. Neither Syntax-Based Normalization nor >> > Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of >> > RFC3986) are performed. Characters additionally allowed in IRI >> > references are treated in the same way that unreserved characters are >> > treated in URI references, per section 6.5 of Internationalized >> Resource >> > Identifiers (IRIs) [RFC3987]. >> > ]] >> > That paragraph only talks about resolving relative URIs. It does not >> > specify the base URI. >> > >> > The first sentence of the second paragraph says: >> > [[ >> > The @base directive defines the Base IRI used to resolve relative IRIs >> > per RFC3986 section 5.1.1, "Base URI Embedded in Content". >> > ]] >> > and RFC3986 section 5.1.1 says: "Within certain media types, a base >> URI >> > for relative references can be embedded within the content itself". >> > Since the Turtle directive is called "@base" (or "BASE") and the >> Turtle >> > spec often uses the term "base URI", this would strongly suggest that >> > the @base directive is used to specify a base URI that is "embedded >> > within the content itself". But if you and Andy are telling me that >> > @base may provide a relative URI, then the actual base URI is *not* >> > actually "embedded within the content itself". Rather, it is >> > (recursively) determined by resolving that relative URI against some >> > other base URI. >> > >> > The rest of the second paragraph in Turtle section 6.3 says: >> > [[ >> > Section 5.1.2, "Base URI from the Encapsulating Entity" defines how >> the >> > In-Scope Base IRI may come from an encapsulating document, such as a >> > SOAP envelope with an xml:base directive or a mime multipart document >> > with a Content-Location header. The "Retrieval URI" identified in >> 5.1.3, >> > Base "URI from the Retrieval URI", is the URL from which a particular >> > Turtle document was retrieved. If none of the above specifies the Base >> > URI, the default Base URI (section 5.1.4, "Default Base URI") is used. >> > Each @base directive sets a new In-Scope Base URI, relative to the >> > previous one. >> > ]] >> > Notice that it only references RFC3986 sections 5.1.2 and 5.1.3, which >> > only talk (vaguely) about where the base URI might come from. Those >> > sections do not constrain the base URI to be an absolute-URI. It >> is the >> > beginning of RFC3986 section 5.1 that constrains a base URI to be an >> > absolute-URI, and that portion is *not* referenced by the Turtle spec. >> > >> > The last sentence of that second paragraph in Turtle section 6.3 does >> > say "Each @base directive sets a new In-Scope Base URI, relative to >> the >> > previous one", and I guess that sentence is the justification for why >> > you and Andy are saying that @base can specify a relative URI. But >> > knowing that RFC3986 requires a base URI to be an absolute-URI, I had >> > understood that sentence to mean "Each @base directive sets a new >> > In-Scope Base URI, [in relation to] to the previous one", i.e., it is >> > new in relation to the previous one. I had no idea it was suggesting >> > that @base could specify a relative URI. >> > >> > Bottom line: >> > >> > - This stuff is not at all clear in the current wording. >> > >> > - If @base is permitted to specify a relative IRI then: (a) an >> > explanation should be added to explain how that relative IRI is >> > converted into an absolute-IRI (including what happens to any fragment >> > identifier that the relative IRI contains); and (b) Turtle will not be >> > aligned with SPARQL in this regard. >> > >> > - If @base is NOT permitted to specify a relative IRI then the >> Turtle >> > spec should make clear that @base must specify an absolute-IRI, in >> > alignment with SPARQL. >> > >> > I was not aware that HTML allowed base URIs to be relative, but, it >> > seems more important to align Turtle with SPARQL than with HTML. Plus >> > it would also be simpler. >> > >> > David >> > >> >> >> >> >> >> > > > >
Received on Friday, 31 May 2013 01:24:37 UTC