- From: Ron Daniel <rdaniel@interwoven.com>
- Date: Fri, 1 Jun 2001 13:57:24 -0700
- To: "Lee Jonas" <lee.jonas@cakehouse.co.uk>
- Cc: "RDF Interest" <www-rdf-interest@w3.org>
- Message-ID: <EMEKICCGFEKJFGKMFLEPKEDNCLAA.rdaniel@interwoven.com>
Hi Lee,
Thanks for the painstaking work on the examples.
My error, you are correct that the base URI (which can be
set and changed by the xml:base attribute) and the URI of
the 'current document' (which cannot be changed from within
the document) are not the same thing. I'll double-check to
make sure that my conceptual error is not embodied in the
XPointer doc.
Now, where were things in this thread before I jumped in
with both left feet...
Ron Daniel Jr.
Standards Architect
Tel: +1 415 778 3113
Fax: +1 415 778 3131
Email: rdaniel@interwoven.com
Visit www.interwoven.com
Moving Business to the Web
> -----Original Message-----
> From: Lee Jonas [mailto:lee.jonas@cakehouse.co.uk]
> Sent: Friday, June 01, 2001 4:38 AM
> To: 'Ron Daniel'
> Cc: RDF Interest
> Subject: RE: RDF and xml:base
>
>
> Ron Daniel [mailto:rdaniel@interwoven.com] wrote:
>
> >Um, if what I was describing DOES go against RFC 2396, it would
> be an error
> >and would need to be fixed. But I don't think it does.
> >
> >
> >> Again, from RFC 2396, section 5.2:
> >> [[
> >>
> >> For each URI reference, the following steps are performed in order:
> >>
> >> 1) The URI reference is parsed into the potential four
> components and
> >> fragment identifier, as described in Section 4.3.
> >> ]]
> >>
> >> i.e. scheme, authority, path and query parts of the URI, plus
> >> possibly a fragment identifier.
> >>
> >> [[
> >> 2) If the path component is empty and the scheme, authority, and
> >> query components are undefined, then it is a reference to the
> >> current document and we are done. Otherwise, the reference URI's
> >> query and fragment components are defined as found (or not found)
> >> within the URI reference and not inherited from the base URI.
> >> ]]
> >>
> >> i.e. a fragment on its own => "it is a reference to the current
> >> document and we are done".
> >
> >Right, but Section 5.2 actually starts by saying:
> >
> >> 5.2. Resolving Relative References to Absolute Form
> >>
> >> This section describes an example algorithm for resolving URI
> >> references that might be relative to a given base URI.
> >>
> >> The base URI is established ACCORDING TO THE RULES OF SECTION 5.1 and
> >> parsed into the four main components as described in Section 3.
> >
> >(emphasis added)
> >
> >Section 5.1 says that the highest priority way of determining the
> >base URI is:
> >
> >> 5.1.1. Base URI within Document Content
> >>
> >> Within certain document media types, the base URI of the
> document can
> >> be embedded within the content itself such that it can be readily
> >> obtained by a parser.
> >
> >which will be xml:base for XML documents (once it becomes a REC).
> >
> >So, if xml:base is specified, it is what is parsed into the components.
> >
> >Ron
> >
>
> Yes, the value of xml:base is parsed into the components for the
> "base URI", but it is not the same thing as the URI of the
> "current document".
>
> In C++ code:
>
> //----------------------------------------------------------
> const std::string docURI = getCurrentDocURI();
>
> std::string baseURI = getContentSpecifiedBaseURI();
> if(baseURI.empty()) {
>
> baseURI = getEnclosingDocURI();
> if(baseURI.empty()) {
>
> baseURI = docURI;
> if(baseURI.empty()) {
>
> // set base URI to some application specific URI
> }
> }
> }
> //----------------------------------------------------------
>
>
> Note that whilst processing the "current document", its URI is
> *fixed*, (i.e. constant) for the purposes of resolving relative
> URI-references within it. Whether the base URI is specified in a
> document's content or not, you could never change the URI for the
> "current document".
>
> Let's consider a couple of resolution examples to clarify:
>
> Example 1
> =========
> A document with a URI of 'http://other.com/doc.rdf' has content:
>
> <?xml version="1.0"?>
> <rdf:RDF xmlns:rdf='...' xmlns:ex='http://example.org/'
> xml:base='http://example.org/Base/' >
> <rdf:Description rdf:ID='localID'>
> <ex:property>PropVal</ex:property>
> </rdf:Description>
> </rdf:RDF>
>
> Now, whilst processing this document, using RFC2396 to resolve
> localID's absolute URI-reference:
>
> [[
> 5.2. Resolving Relative References to Absolute Form
>
> This section describes an example algorithm for resolving URI
> references that might be relative to a given base URI.
>
> The base URI is established according to the rules of Section 5.1 and
> parsed into the four main components as described in Section 3. Note
> that only the scheme component is required to be present in the base
> URI; the other components may be empty or undefined. A component is
> undefined if its preceding separator does not appear in the URI
> reference; the path component is never undefined, though it may be
> empty. The base URI's query component is not used by the resolution
> algorithm and may be discarded.
> ]]
>
>
> According to 5.1 the base URI has been specified by xml:base in
> the content, which takes highest priority, so:
> base URI => scheme='http'; authority='example.org'; path='Base/';
> query=<undefined>
>
>
> [[
> For each URI reference, the following steps are performed in order:
>
> 1) The URI reference is parsed into the potential four components and
> fragment identifier, as described in Section 4.3.
> ]]
>
>
> The 'localID' URI reference is:
> URI reference => scheme=<undefined>; authority=<undefined>;
> path=<empty>; query=<undefined>; fragment='localID'
>
>
> [[
> 2) If the path component is empty and the scheme, authority, and
> query components are undefined, then it is a reference to the
> current document and we are done. Otherwise, the reference URI's
> query and fragment components are defined as found (or not found)
> within the URI reference and not inherited from the base URI.
> ]]
>
> It is a reference to the current document and we are done - i.e.
> it is a reference to 'http://other.com/doc.rdf'. 'localID' is a
> fragment identifier within that resource, so the absolute URI
> reference is 'http://other.com/doc.rdf#localID'.
>
>
>
> Example 2
> =========
> A document with a URI of 'http://other.com/doc2.rdf' has content:
>
> <?xml version="1.0"?>
> <rdf:RDF xmlns:rdf='...' xmlns:ex='http://example.org/'
> xml:base='http://example.org/Base/' >
> <rdf:Description rdf:about='./#localID'>
> <ex:property>PropVal</ex:property>
> </rdf:Description>
> </rdf:RDF>
>
> Again using RFC2396 to resolve localID's absolute URI reference:
>
> as before, work out the base URI:
> base URI => scheme='http'; authority='example.org'; path='Base/';
> query=<undefined>
>
>
> [[
> For each URI reference, the following steps are performed in order:
>
> 1) The URI reference is parsed into the potential four components and
> fragment identifier, as described in Section 4.3.
> ]]
>
>
> The 'localID' URI-reference is:
> URI reference => sheme=<undefined>; authority=<undefined>;
> path='./'; query=<undefined>; fragment='localID'
>
>
> [[
> 2) If the path component is empty and the scheme, authority, and
> query components are undefined, then it is a reference to the
> current document and we are done. Otherwise, the reference URI's
> query and fragment components are defined as found (or not found)
> within the URI reference and not inherited from the base URI.
> ]]
>
>
> The path component is now not empty, so the query and fragment
> components are defined as found (i.e. any base URI query and/or
> fragment are always ignored):
> reference URI => scheme=<TBD>; authority=<TBD>; path=<TBD>;
> query=<undefined>; fragment='localID'
>
> (Note the reference URI is a third variable which holds the
> result of resolving the localID URI reference against the base URI).
>
>
> [[
> 3) If the scheme component is defined, indicating that the reference
> starts with a scheme name, then the reference is interpreted as an
> absolute URI and we are done. Otherwise, the reference URI's
> scheme is inherited from the base URI's scheme component.
>
> Due to a loophole in prior specifications [RFC1630], some parsers
> allow the scheme name to be present in a relative URI if it is the
> same as the base URI scheme. Unfortunately, this can conflict
> with the correct parsing of non-hierarchical URI. For backwards
> compatibility, an implementation may work around such references
> by removing the scheme if it matches that of the base URI and the
> scheme is known to always use the <hier_part> syntax. The parser
> can then continue with the steps below for the remainder of the
> reference components. Validating parsers should mark such a
> misformed relative reference as an error.
> ]]
>
>
> The reference URI's scheme is inherited from the base URI's
> scheme component:
> reference URI => scheme='http'; authority=<TBD>; path=<TBD>;
> query=<undefined>; fragment='localID'
>
>
> [[
> 4) If the authority component is defined, then the reference is a
> network-path and we skip to step 7. Otherwise, the reference
> URI's authority is inherited from the base URI's authority
> component, which will also be undefined if the URI scheme does not
> use an authority component.
> ]]
>
>
> The reference URI's authority is inherited from the base URI's
> authority component:
> reference URI => scheme='http'; authority='example.org';
> path=<TBD>; query=<undefined>; fragment='localID'
>
>
> [[
> 5) If the path component begins with a slash character ("/"), then
> the reference is an absolute-path and we skip to step 7.
> ]]
>
>
> The path component begins with a '.' so we go on to step 6.
>
>
> [[
> 6) If this step is reached, then we are resolving a relative-path
> reference. The relative path needs to be merged with the base
> URI's path. Although there are many ways to do this, we will
> describe a simple method using a separate string buffer.
>
> a) All but the last segment of the base URI's path component is
> copied to the buffer. In other words, any characters after the
> last (right-most) slash character, if any, are excluded.
>
> b) The reference's path component is appended to the buffer
> string.
>
> c) All occurrences of "./", where "." is a complete path segment,
> are removed from the buffer string.
>
> d) If the buffer string ends with "." as a complete path segment,
> that "." is removed.
>
> e) All occurrences of "<segment>/../", where <segment> is a
> complete path segment not equal to "..", are removed from the
> buffer string. Removal of these path segments is performed
> iteratively, removing the leftmost matching pattern on each
> iteration, until no matching pattern remains.
>
> f) If the buffer string ends with "<segment>/..", where <segment>
> is a complete path segment not equal to "..", that
> "<segment>/.." is removed.
>
> g) If the resulting buffer string still begins with one or more
> complete path segments of "..", then the reference is
> considered to be in error. Implementations may handle this
> error by retaining these components in the resolved path (i.e.,
> treating them as part of the final URI), by removing them from
> the resolved path (i.e., discarding relative levels above the
> root), or by avoiding traversal of the reference.
>
> h) The remaining buffer string is the reference URI's new path
> component.
> ]]
>
>
> reference URI => scheme='http'; authority='example.org';
> path='Base/'; query=<undefined>; fragment='localID'
>
>
> [[
> 7) The resulting URI components, including any inherited from the
> base URI, are recombined to give the absolute form of the URI
> reference. [snip]
> ]]
>
>
> absolute URI reference = 'http://example.org/Base/#localID'
>
>
>
> In summary, xml:base is a way of specifying the base URI within
> an XML document's content, never the document's URI itself. A
> base URI is used to resolve relative URI references into absolute
> URI references. A URI reference consisting of a fragment
> identifier only (i.e. starting with a '#') is a reference to the
> current document and is not affected by the base URI.
>
>
> regards
>
> Lee
>
Received on Friday, 1 June 2001 16:59:22 UTC