- From: David Booth <david@dbooth.org>
- Date: Sun, 26 May 2013 23:36:20 -0400
- To: Markus Lanthaler <markus.lanthaler@gmx.net>
- CC: public-rdf-comments@w3.org
Hi Markus, On 05/26/2013 06:37 PM, Markus Lanthaler wrote: > On Sunday, May 26, 2013 7:17 PM, David Booth wrote: >>> The syntax has >>> >>> @base IRIREF . >>> >>> and the @base is no different to other URIs - it is subject to URI >>> resolution. >> >> But I don't see anything there that explicitly requires IRIREF to be an >> absolute-IRI as defined in RFC3987. Other parts of the Turtle syntax >> (such as the @prefix production) also use the IRIREF syntax production >> without requiring it to be an absolute-IRI. That's why it isn't clear >> that in the case of @base it must be an absolute-IRI. > > It can be a relative IRI as well. In that case it gets resolved against the > currently active base IRI. > > >>> @base <relURI> . >>> >>> is also legal as is >>> >>> @base <../sibling> . >>> >>> which might be occasionally useful. >> >> Huh? Are you saying that @base can recursively specify the base URI >> using a *relative* URI? Then there would have to be a base URI of the >> @base URI? > > Yes, not recursively though but sequentially. > > >> I'm very surprised to hear you say that a relative @base URI would be >> legal. I don't think that should be allowed. That seems too >> mysterious and error prone to me. > > HTML allows that as well e.g. > > >> That would require a relative URI specified in >> @base to be resolved using "Reference Resolution", which is specified >> in >> section 5 of RFC 3986. But the result of "Reference Resolution" is "a >> string matching the <URI> syntax rule of Section 3", and the <URI> >> production *allows* a fragment identifier. > > And why should that be a problem? Because a base URI as defined in RFC 3986 does not permit a fragment identifier. Therefore, if @base specified a relative URI which was resolved using RFC3986 "Reference Resolution" then the result could contain a fragment identifier. Thus, a Turtle "base URI" could contain a fragment identifier, whereas an RFC 3986 "base URI" does not permit a fragment identifier. > > >> I think it would be better to align directly with SPARQL and RFC 3986 >> and RFC 3987 by explicitly requiring @base to specify an absolute-IRI. > > It is aligned with the two RFCs. There might be a case where you can't > resolve a relative @base as the document itself has no IRI but that's the > same problem as not being able to resolve relative IRIs anywhere else in > such a document. If it is aligned with RFC 3986 and 3987 then the alignment certainly is not very visible. I spent quite a lot of time trying to track it down, and finally concluded that nothing in the Turtle spec requires Turtle's notion of a base URI (which AFAICT is specified using @base) to be an absolute-IRI as defined in those RFCs. Can you please point me to the exact wording that requires a Turtle base URI to be an absolute-IRI? The Turtle EBNF certainly does not require it. Turtle section 6.3 has two paragraphs. The first says: http://www.w3.org/TR/turtle/#sec-iri-references [[ Relative IRIs are resolved with base IRIs as per Uniform Resource Identifier (URI): Generic Syntax [RFC3986] using only the basic algorithm in section 5.2. Neither Syntax-Based Normalization nor Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of RFC3986) are performed. Characters additionally allowed in IRI references are treated in the same way that unreserved characters are treated in URI references, per section 6.5 of Internationalized Resource Identifiers (IRIs) [RFC3987]. ]] That paragraph only talks about resolving relative URIs. It does not specify the base URI. The first sentence of the second paragraph says: [[ The @base directive defines the Base IRI used to resolve relative IRIs per RFC3986 section 5.1.1, "Base URI Embedded in Content". ]] and RFC3986 section 5.1.1 says: "Within certain media types, a base URI for relative references can be embedded within the content itself". Since the Turtle directive is called "@base" (or "BASE") and the Turtle spec often uses the term "base URI", this would strongly suggest that the @base directive is used to specify a base URI that is "embedded within the content itself". But if you and Andy are telling me that @base may provide a relative URI, then the actual base URI is *not* actually "embedded within the content itself". Rather, it is (recursively) determined by resolving that relative URI against some other base URI. The rest of the second paragraph in Turtle section 6.3 says: [[ Section 5.1.2, "Base URI from the Encapsulating Entity" defines how the In-Scope Base IRI may come from an encapsulating document, such as a SOAP envelope with an xml:base directive or a mime multipart document with a Content-Location header. The "Retrieval URI" identified in 5.1.3, Base "URI from the Retrieval URI", is the URL from which a particular Turtle document was retrieved. If none of the above specifies the Base URI, the default Base URI (section 5.1.4, "Default Base URI") is used. Each @base directive sets a new In-Scope Base URI, relative to the previous one. ]] Notice that it only references RFC3986 sections 5.1.2 and 5.1.3, which only talk (vaguely) about where the base URI might come from. Those sections do not constrain the base URI to be an absolute-URI. It is the beginning of RFC3986 section 5.1 that constrains a base URI to be an absolute-URI, and that portion is *not* referenced by the Turtle spec. The last sentence of that second paragraph in Turtle section 6.3 does say "Each @base directive sets a new In-Scope Base URI, relative to the previous one", and I guess that sentence is the justification for why you and Andy are saying that @base can specify a relative URI. But knowing that RFC3986 requires a base URI to be an absolute-URI, I had understood that sentence to mean "Each @base directive sets a new In-Scope Base URI, [in relation to] to the previous one", i.e., it is new in relation to the previous one. I had no idea it was suggesting that @base could specify a relative URI. Bottom line: - This stuff is not at all clear in the current wording. - If @base is permitted to specify a relative IRI then: (a) an explanation should be added to explain how that relative IRI is converted into an absolute-IRI (including what happens to any fragment identifier that the relative IRI contains); and (b) Turtle will not be aligned with SPARQL in this regard. - If @base is NOT permitted to specify a relative IRI then the Turtle spec should make clear that @base must specify an absolute-IRI, in alignment with SPARQL. I was not aware that HTML allowed base URIs to be relative, but, it seems more important to align Turtle with SPARQL than with HTML. Plus it would also be simpler. David
Received on Monday, 27 May 2013 03:36:50 UTC