- From: Ron Daniel <rdaniel@interwoven.com>
- Date: Fri, 1 Jun 2001 13:57:24 -0700
- To: "Lee Jonas" <lee.jonas@cakehouse.co.uk>
- Cc: "RDF Interest" <www-rdf-interest@w3.org>
- Message-ID: <EMEKICCGFEKJFGKMFLEPKEDNCLAA.rdaniel@interwoven.com>
Hi Lee, Thanks for the painstaking work on the examples. My error, you are correct that the base URI (which can be set and changed by the xml:base attribute) and the URI of the 'current document' (which cannot be changed from within the document) are not the same thing. I'll double-check to make sure that my conceptual error is not embodied in the XPointer doc. Now, where were things in this thread before I jumped in with both left feet... Ron Daniel Jr. Standards Architect Tel: +1 415 778 3113 Fax: +1 415 778 3131 Email: rdaniel@interwoven.com Visit www.interwoven.com Moving Business to the Web > -----Original Message----- > From: Lee Jonas [mailto:lee.jonas@cakehouse.co.uk] > Sent: Friday, June 01, 2001 4:38 AM > To: 'Ron Daniel' > Cc: RDF Interest > Subject: RE: RDF and xml:base > > > Ron Daniel [mailto:rdaniel@interwoven.com] wrote: > > >Um, if what I was describing DOES go against RFC 2396, it would > be an error > >and would need to be fixed. But I don't think it does. > > > > > >> Again, from RFC 2396, section 5.2: > >> [[ > >> > >> For each URI reference, the following steps are performed in order: > >> > >> 1) The URI reference is parsed into the potential four > components and > >> fragment identifier, as described in Section 4.3. > >> ]] > >> > >> i.e. scheme, authority, path and query parts of the URI, plus > >> possibly a fragment identifier. > >> > >> [[ > >> 2) If the path component is empty and the scheme, authority, and > >> query components are undefined, then it is a reference to the > >> current document and we are done. Otherwise, the reference URI's > >> query and fragment components are defined as found (or not found) > >> within the URI reference and not inherited from the base URI. > >> ]] > >> > >> i.e. a fragment on its own => "it is a reference to the current > >> document and we are done". > > > >Right, but Section 5.2 actually starts by saying: > > > >> 5.2. Resolving Relative References to Absolute Form > >> > >> This section describes an example algorithm for resolving URI > >> references that might be relative to a given base URI. > >> > >> The base URI is established ACCORDING TO THE RULES OF SECTION 5.1 and > >> parsed into the four main components as described in Section 3. > > > >(emphasis added) > > > >Section 5.1 says that the highest priority way of determining the > >base URI is: > > > >> 5.1.1. Base URI within Document Content > >> > >> Within certain document media types, the base URI of the > document can > >> be embedded within the content itself such that it can be readily > >> obtained by a parser. > > > >which will be xml:base for XML documents (once it becomes a REC). > > > >So, if xml:base is specified, it is what is parsed into the components. > > > >Ron > > > > Yes, the value of xml:base is parsed into the components for the > "base URI", but it is not the same thing as the URI of the > "current document". > > In C++ code: > > //---------------------------------------------------------- > const std::string docURI = getCurrentDocURI(); > > std::string baseURI = getContentSpecifiedBaseURI(); > if(baseURI.empty()) { > > baseURI = getEnclosingDocURI(); > if(baseURI.empty()) { > > baseURI = docURI; > if(baseURI.empty()) { > > // set base URI to some application specific URI > } > } > } > //---------------------------------------------------------- > > > Note that whilst processing the "current document", its URI is > *fixed*, (i.e. constant) for the purposes of resolving relative > URI-references within it. Whether the base URI is specified in a > document's content or not, you could never change the URI for the > "current document". > > Let's consider a couple of resolution examples to clarify: > > Example 1 > ========= > A document with a URI of 'http://other.com/doc.rdf' has content: > > <?xml version="1.0"?> > <rdf:RDF xmlns:rdf='...' xmlns:ex='http://example.org/' > xml:base='http://example.org/Base/' > > <rdf:Description rdf:ID='localID'> > <ex:property>PropVal</ex:property> > </rdf:Description> > </rdf:RDF> > > Now, whilst processing this document, using RFC2396 to resolve > localID's absolute URI-reference: > > [[ > 5.2. Resolving Relative References to Absolute Form > > This section describes an example algorithm for resolving URI > references that might be relative to a given base URI. > > The base URI is established according to the rules of Section 5.1 and > parsed into the four main components as described in Section 3. Note > that only the scheme component is required to be present in the base > URI; the other components may be empty or undefined. A component is > undefined if its preceding separator does not appear in the URI > reference; the path component is never undefined, though it may be > empty. The base URI's query component is not used by the resolution > algorithm and may be discarded. > ]] > > > According to 5.1 the base URI has been specified by xml:base in > the content, which takes highest priority, so: > base URI => scheme='http'; authority='example.org'; path='Base/'; > query=<undefined> > > > [[ > For each URI reference, the following steps are performed in order: > > 1) The URI reference is parsed into the potential four components and > fragment identifier, as described in Section 4.3. > ]] > > > The 'localID' URI reference is: > URI reference => scheme=<undefined>; authority=<undefined>; > path=<empty>; query=<undefined>; fragment='localID' > > > [[ > 2) If the path component is empty and the scheme, authority, and > query components are undefined, then it is a reference to the > current document and we are done. Otherwise, the reference URI's > query and fragment components are defined as found (or not found) > within the URI reference and not inherited from the base URI. > ]] > > It is a reference to the current document and we are done - i.e. > it is a reference to 'http://other.com/doc.rdf'. 'localID' is a > fragment identifier within that resource, so the absolute URI > reference is 'http://other.com/doc.rdf#localID'. > > > > Example 2 > ========= > A document with a URI of 'http://other.com/doc2.rdf' has content: > > <?xml version="1.0"?> > <rdf:RDF xmlns:rdf='...' xmlns:ex='http://example.org/' > xml:base='http://example.org/Base/' > > <rdf:Description rdf:about='./#localID'> > <ex:property>PropVal</ex:property> > </rdf:Description> > </rdf:RDF> > > Again using RFC2396 to resolve localID's absolute URI reference: > > as before, work out the base URI: > base URI => scheme='http'; authority='example.org'; path='Base/'; > query=<undefined> > > > [[ > For each URI reference, the following steps are performed in order: > > 1) The URI reference is parsed into the potential four components and > fragment identifier, as described in Section 4.3. > ]] > > > The 'localID' URI-reference is: > URI reference => sheme=<undefined>; authority=<undefined>; > path='./'; query=<undefined>; fragment='localID' > > > [[ > 2) If the path component is empty and the scheme, authority, and > query components are undefined, then it is a reference to the > current document and we are done. Otherwise, the reference URI's > query and fragment components are defined as found (or not found) > within the URI reference and not inherited from the base URI. > ]] > > > The path component is now not empty, so the query and fragment > components are defined as found (i.e. any base URI query and/or > fragment are always ignored): > reference URI => scheme=<TBD>; authority=<TBD>; path=<TBD>; > query=<undefined>; fragment='localID' > > (Note the reference URI is a third variable which holds the > result of resolving the localID URI reference against the base URI). > > > [[ > 3) If the scheme component is defined, indicating that the reference > starts with a scheme name, then the reference is interpreted as an > absolute URI and we are done. Otherwise, the reference URI's > scheme is inherited from the base URI's scheme component. > > Due to a loophole in prior specifications [RFC1630], some parsers > allow the scheme name to be present in a relative URI if it is the > same as the base URI scheme. Unfortunately, this can conflict > with the correct parsing of non-hierarchical URI. For backwards > compatibility, an implementation may work around such references > by removing the scheme if it matches that of the base URI and the > scheme is known to always use the <hier_part> syntax. The parser > can then continue with the steps below for the remainder of the > reference components. Validating parsers should mark such a > misformed relative reference as an error. > ]] > > > The reference URI's scheme is inherited from the base URI's > scheme component: > reference URI => scheme='http'; authority=<TBD>; path=<TBD>; > query=<undefined>; fragment='localID' > > > [[ > 4) If the authority component is defined, then the reference is a > network-path and we skip to step 7. Otherwise, the reference > URI's authority is inherited from the base URI's authority > component, which will also be undefined if the URI scheme does not > use an authority component. > ]] > > > The reference URI's authority is inherited from the base URI's > authority component: > reference URI => scheme='http'; authority='example.org'; > path=<TBD>; query=<undefined>; fragment='localID' > > > [[ > 5) If the path component begins with a slash character ("/"), then > the reference is an absolute-path and we skip to step 7. > ]] > > > The path component begins with a '.' so we go on to step 6. > > > [[ > 6) If this step is reached, then we are resolving a relative-path > reference. The relative path needs to be merged with the base > URI's path. Although there are many ways to do this, we will > describe a simple method using a separate string buffer. > > a) All but the last segment of the base URI's path component is > copied to the buffer. In other words, any characters after the > last (right-most) slash character, if any, are excluded. > > b) The reference's path component is appended to the buffer > string. > > c) All occurrences of "./", where "." is a complete path segment, > are removed from the buffer string. > > d) If the buffer string ends with "." as a complete path segment, > that "." is removed. > > e) All occurrences of "<segment>/../", where <segment> is a > complete path segment not equal to "..", are removed from the > buffer string. Removal of these path segments is performed > iteratively, removing the leftmost matching pattern on each > iteration, until no matching pattern remains. > > f) If the buffer string ends with "<segment>/..", where <segment> > is a complete path segment not equal to "..", that > "<segment>/.." is removed. > > g) If the resulting buffer string still begins with one or more > complete path segments of "..", then the reference is > considered to be in error. Implementations may handle this > error by retaining these components in the resolved path (i.e., > treating them as part of the final URI), by removing them from > the resolved path (i.e., discarding relative levels above the > root), or by avoiding traversal of the reference. > > h) The remaining buffer string is the reference URI's new path > component. > ]] > > > reference URI => scheme='http'; authority='example.org'; > path='Base/'; query=<undefined>; fragment='localID' > > > [[ > 7) The resulting URI components, including any inherited from the > base URI, are recombined to give the absolute form of the URI > reference. [snip] > ]] > > > absolute URI reference = 'http://example.org/Base/#localID' > > > > In summary, xml:base is a way of specifying the base URI within > an XML document's content, never the document's URI itself. A > base URI is used to resolve relative URI references into absolute > URI references. A URI reference consisting of a fragment > identifier only (i.e. starting with a '#') is a reference to the > current document and is not affected by the base URI. > > > regards > > Lee >
Received on Friday, 1 June 2001 16:59:22 UTC