RE: RDF and xml:base

Hi Lee,

Thanks for the painstaking work on the examples.
My error, you are correct that the base URI (which can be
set and changed by the xml:base attribute) and the URI of
the 'current document' (which cannot be changed from within
the document) are not the same thing. I'll double-check to
make sure that my conceptual error is not embodied in the
XPointer doc.

Now, where were things in this thread before I jumped in
with both left feet... 

Ron Daniel Jr.
Standards Architect
Tel: +1 415 778 3113
Fax: +1 415 778 3131
Email: rdaniel@interwoven.com 

Visit www.interwoven.com
Moving Business to the Web 

> -----Original Message-----
> From: Lee Jonas [mailto:lee.jonas@cakehouse.co.uk]
> Sent: Friday, June 01, 2001 4:38 AM
> To: 'Ron Daniel'
> Cc: RDF Interest
> Subject: RE: RDF and xml:base
> 
> 
> Ron Daniel [mailto:rdaniel@interwoven.com] wrote:
> 
> >Um, if what I was describing DOES go against RFC 2396, it would 
> be an error
> >and would need to be fixed. But I don't think it does.
> >
> >
> >> Again, from RFC 2396, section 5.2:
> >> [[
> >> 
> >>    For each URI reference, the following steps are performed in order:
> >> 
> >>    1) The URI reference is parsed into the potential four 
> components and
> >>       fragment identifier, as described in Section 4.3.
> >> ]]
> >> 
> >> i.e. scheme, authority, path and query parts of the URI, plus 
> >> possibly a fragment identifier.
> >> 
> >> [[
> >>    2) If the path component is empty and the scheme, authority, and
> >>       query components are undefined, then it is a reference to the
> >>       current document and we are done.  Otherwise, the reference URI's
> >>       query and fragment components are defined as found (or not found)
> >>       within the URI reference and not inherited from the base URI.
> >> ]]
> >> 
> >> i.e. a fragment on its own => "it is a reference to the current 
> >> document and we are done".
> >
> >Right, but Section 5.2 actually starts by saying:
> >
> >> 5.2. Resolving Relative References to Absolute Form
> >> 
> >>   This section describes an example algorithm for resolving URI
> >>   references that might be relative to a given base URI.
> >>
> >>   The base URI is established ACCORDING TO THE RULES OF SECTION 5.1 and
> >>   parsed into the four main components as described in Section 3.  
> >
> >(emphasis added)
> >
> >Section 5.1 says that the highest priority way of determining the
> >base URI is:
> >
> >> 5.1.1. Base URI within Document Content
> >> 
> >>    Within certain document media types, the base URI of the 
> document can
> >>    be embedded within the content itself such that it can be readily
> >>    obtained by a parser.  
> >
> >which will be xml:base for XML documents (once it becomes a REC).
> >
> >So, if xml:base is specified, it is what is parsed into the components.
> >
> >Ron
> >
> 
> Yes, the value of xml:base is parsed into the components for the 
> "base URI", but it is not the same thing as the URI of the 
> "current document".
> 
> In C++ code:
> 
> //----------------------------------------------------------
> const std::string docURI = getCurrentDocURI();
> 
> std::string baseURI = getContentSpecifiedBaseURI();
> if(baseURI.empty()) {
> 
>   baseURI = getEnclosingDocURI();
>   if(baseURI.empty()) {
> 
>     baseURI = docURI;
>     if(baseURI.empty()) {
> 
>       // set base URI to some application specific URI
>     }
>   }
> }
> //----------------------------------------------------------
> 
> 
> Note that whilst processing the "current document", its URI is 
> *fixed*, (i.e. constant) for the purposes of resolving relative 
> URI-references within it.  Whether the base URI is specified in a 
> document's content or not, you could never change the URI for the 
> "current document".
> 
> Let's consider a couple of resolution examples to clarify:
> 
> Example 1
> =========
> A document with a URI of 'http://other.com/doc.rdf' has content:
> 
> <?xml version="1.0"?>
> <rdf:RDF xmlns:rdf='...' xmlns:ex='http://example.org/'
>          xml:base='http://example.org/Base/' >
>   <rdf:Description rdf:ID='localID'>
>     <ex:property>PropVal</ex:property>
>   </rdf:Description>
> </rdf:RDF>
> 
> Now, whilst processing this document, using RFC2396 to resolve 
> localID's absolute URI-reference:
> 
> [[
> 5.2. Resolving Relative References to Absolute Form
> 
>    This section describes an example algorithm for resolving URI
>    references that might be relative to a given base URI.
> 
>    The base URI is established according to the rules of Section 5.1 and
>    parsed into the four main components as described in Section 3.  Note
>    that only the scheme component is required to be present in the base
>    URI; the other components may be empty or undefined.  A component is
>    undefined if its preceding separator does not appear in the URI
>    reference; the path component is never undefined, though it may be
>    empty.  The base URI's query component is not used by the resolution
>    algorithm and may be discarded.
> ]]
> 
> 
> According to 5.1 the base URI has been specified by xml:base in 
> the content, which takes highest priority, so:
> base URI => scheme='http'; authority='example.org'; path='Base/'; 
> query=<undefined>
> 
> 
> [[
>    For each URI reference, the following steps are performed in order:
> 
>    1) The URI reference is parsed into the potential four components and
>       fragment identifier, as described in Section 4.3.
> ]]
> 
> 
> The 'localID' URI reference is:
> URI reference => scheme=<undefined>; authority=<undefined>; 
> path=<empty>; query=<undefined>; fragment='localID'
> 
> 
> [[
>    2) If the path component is empty and the scheme, authority, and
>       query components are undefined, then it is a reference to the
>       current document and we are done.  Otherwise, the reference URI's
>       query and fragment components are defined as found (or not found)
>       within the URI reference and not inherited from the base URI.
> ]]
> 
> It is a reference to the current document and we are done - i.e. 
> it is a reference to 'http://other.com/doc.rdf'.  'localID' is a 
> fragment identifier within that resource, so the absolute URI 
> reference is 'http://other.com/doc.rdf#localID'.
> 
> 
> 
> Example 2
> =========
> A document with a URI of 'http://other.com/doc2.rdf' has content:
> 
> <?xml version="1.0"?>
> <rdf:RDF xmlns:rdf='...' xmlns:ex='http://example.org/'
>          xml:base='http://example.org/Base/' >
>   <rdf:Description rdf:about='./#localID'>
>     <ex:property>PropVal</ex:property>
>   </rdf:Description>
> </rdf:RDF>
> 
> Again using RFC2396 to resolve localID's absolute URI reference:
> 
> as before, work out the base URI:
> base URI => scheme='http'; authority='example.org'; path='Base/'; 
> query=<undefined>
> 
> 
> [[
>    For each URI reference, the following steps are performed in order:
> 
>    1) The URI reference is parsed into the potential four components and
>       fragment identifier, as described in Section 4.3.
> ]]
> 
> 
> The 'localID' URI-reference is:
> URI reference => sheme=<undefined>; authority=<undefined>; 
> path='./'; query=<undefined>; fragment='localID'
> 
> 
> [[
>    2) If the path component is empty and the scheme, authority, and
>       query components are undefined, then it is a reference to the
>       current document and we are done.  Otherwise, the reference URI's
>       query and fragment components are defined as found (or not found)
>       within the URI reference and not inherited from the base URI.
> ]]
> 
> 
> The path component is now not empty, so the query and fragment 
> components are defined as found (i.e. any base URI query and/or 
> fragment are always ignored):
> reference URI => scheme=<TBD>; authority=<TBD>; path=<TBD>; 
> query=<undefined>; fragment='localID'
> 
> (Note the reference URI is a third variable which holds the 
> result of resolving the localID URI reference against the base URI).
> 
> 
> [[
>    3) If the scheme component is defined, indicating that the reference
>       starts with a scheme name, then the reference is interpreted as an
>       absolute URI and we are done.  Otherwise, the reference URI's
>       scheme is inherited from the base URI's scheme component.
> 
>       Due to a loophole in prior specifications [RFC1630], some parsers
>       allow the scheme name to be present in a relative URI if it is the
>       same as the base URI scheme.  Unfortunately, this can conflict
>       with the correct parsing of non-hierarchical URI.  For backwards
>       compatibility, an implementation may work around such references
>       by removing the scheme if it matches that of the base URI and the
>       scheme is known to always use the <hier_part> syntax.  The parser
>       can then continue with the steps below for the remainder of the
>       reference components.  Validating parsers should mark such a
>       misformed relative reference as an error.
> ]]
> 
> 
> The reference URI's scheme is inherited from the base URI's 
> scheme component:
> reference URI => scheme='http'; authority=<TBD>; path=<TBD>; 
> query=<undefined>; fragment='localID'
> 
> 
> [[
>    4) If the authority component is defined, then the reference is a
>       network-path and we skip to step 7.  Otherwise, the reference
>       URI's authority is inherited from the base URI's authority
>       component, which will also be undefined if the URI scheme does not
>       use an authority component.
> ]]
> 
> 
> The reference URI's authority is inherited from the base URI's 
> authority component:
> reference URI => scheme='http'; authority='example.org'; 
> path=<TBD>; query=<undefined>; fragment='localID'
> 
> 
> [[
>    5) If the path component begins with a slash character ("/"), then
>       the reference is an absolute-path and we skip to step 7.
> ]]
> 
> 
> The path component begins with a '.' so we go on to step 6.
> 
> 
> [[
>    6) If this step is reached, then we are resolving a relative-path
>       reference.  The relative path needs to be merged with the base
>       URI's path.  Although there are many ways to do this, we will
>       describe a simple method using a separate string buffer.
> 
>       a) All but the last segment of the base URI's path component is
>          copied to the buffer.  In other words, any characters after the
>          last (right-most) slash character, if any, are excluded.
> 
>       b) The reference's path component is appended to the buffer
>          string.
> 
>       c) All occurrences of "./", where "." is a complete path segment,
>          are removed from the buffer string.
> 
>       d) If the buffer string ends with "." as a complete path segment,
>          that "." is removed.
> 
>       e) All occurrences of "<segment>/../", where <segment> is a
>          complete path segment not equal to "..", are removed from the
>          buffer string.  Removal of these path segments is performed
>          iteratively, removing the leftmost matching pattern on each
>          iteration, until no matching pattern remains.
> 
>       f) If the buffer string ends with "<segment>/..", where <segment>
>          is a complete path segment not equal to "..", that
>          "<segment>/.." is removed.
> 
>       g) If the resulting buffer string still begins with one or more
>          complete path segments of "..", then the reference is
>          considered to be in error.  Implementations may handle this
>          error by retaining these components in the resolved path (i.e.,
>          treating them as part of the final URI), by removing them from
>          the resolved path (i.e., discarding relative levels above the
>          root), or by avoiding traversal of the reference.
> 
>       h) The remaining buffer string is the reference URI's new path
>          component.
> ]]
> 
> 
> reference URI => scheme='http'; authority='example.org'; 
> path='Base/'; query=<undefined>; fragment='localID'
> 
> 
> [[
>    7) The resulting URI components, including any inherited from the
>       base URI, are recombined to give the absolute form of the URI
>       reference.  [snip]
> ]]
> 
> 
> absolute URI reference = 'http://example.org/Base/#localID'
> 
> 
> 
> In summary, xml:base is a way of specifying the base URI within 
> an XML document's content, never the document's URI itself.  A 
> base URI is used to resolve relative URI references into absolute 
> URI references.  A URI reference consisting of a fragment 
> identifier only (i.e. starting with a '#') is a reference to the 
> current document and is not affected by the base URI.
> 
> 
> regards
> 
> Lee
> 

Received on Friday, 1 June 2001 16:59:22 UTC