- From: Grosso, Paul <pgrosso@ptc.com>
- Date: Thu, 18 May 2006 10:39:35 -0400
- To: <public-xml-core-wg@w3.org>
- Message-ID: <CF83BAA719FD2C439D25CBB1C9D1D3020357AD4F@HQ-MAIL4.ptcnet.ptc.com>
I'll leave it to others to comment on the substance, but I note your wording mentions 2396 which is already superceded. Can we avoid referencing an old RFC? paul ________________________________ From: Konrad Lanz [mailto:Konrad.Lanz@iaik.tugraz.at] Sent: Thursday, 2006 May 18 06:07 To: Richard Tobin Cc: Grosso, Paul; public-xml-core-wg@w3.org Subject: Re: Canonicalization xml:base processing Dear Richard, This email is about the xml:base fix up. First of all I'd like to give some examples to make sure we have an agreement on how the Algorithm for xml:base fix up shall behave. These examples could potentially go into the final document as well. After the examples follows a suggestion for a new section 2.4. Btw. when I tried to merge the algorithms I found out that a lot of what you wrote, was already there in the text, however in a very "xpathified" manner with minor errors, which at least is similar to most of the text in the document. I tried to decrypt this bit a little and I hope it is more readable now. Up front I'd like to mention that after talking to Jose Kahan and thinking about the issue for a little longer we'd still prefer to also perform "dot and dot-dot canonicalization" (aka. remove_dot_segments). It will allow the reuse of existing implementations for relative URI resolution. More important from my point of view however is: "dot and dot-dot canonicalization" allows to map more equivalent documents onto the same serialized output and helps to avoid false negatives in XMLDSig. --- Examples for xml:base fixup --- For the given input <a xml:base="one/two"> <b xml:base="//three/four/./five/./../file.xsd"> <c xml:base="a.file"/> <d> <e xml:base="#bare-name"> <f xml:base=""/> <f1/> </e> <g xml:base="//six/"/> </d> <h xml:base="http://www.iaik.tugraz.at" <http://www.iaik.tugraz.at> > <i xml:base="/aboutus/people/index.php"> <j xml:base="lanz/index.php"> </i> </h> </b> </a> with <a> being clipped out c14 shall output: <b xml:base="//three/four/./five/./../file.xsd"> <c xml:base="a.file"/> <d> <e xml:base="#bare-name"> <f xml:base=""/> <f1/> </e> <g xml:base="//five/"/> </d> <h xml:base="http://www.iaik.tugraz.at" <http://www.iaik.tugraz.at> > <i xml:base="/aboutus/people/index.php"> <j xml:base="lanz/index.php"> </i> </h> </b> with <b> being clipped out c14 gives: <a xml:base="one/two"> <c xml:base="//three/four/./five/./../a.file"/> ("//three/four/a.file") <d xml:base="//three/four/./five/./../file.xsd"> ("//three/four/file.xsd") <e xml:base="#bare-name"> <f xml:base=""/> <f1/> </e> <g xml:base="//five/"/> </d> <h xml:base="http://www.iaik.tugraz.at" <http://www.iaik.tugraz.at> > <i xml:base="/aboutus/people/index.php"> <j xml:base="lanz/index.php"> </i> </h> </a> with <b> and <d> being clipped out: <a xml:base="one/two"> <c xml:base="//three/four/./five/./../a.file"/> <e xml:base="//three/four/./five/./../file.xsd#bare-name"> ("//three/four/file.xsd#bare-name") <f xml:base=""/> <f1/> </e> <g xml:base="//five/"/> <h xml:base="http://www.iaik.tugraz.at" <http://www.iaik.tugraz.at> > <i xml:base="/aboutus/people/index.php"> <j xml:base="lanz/index.php"> </i> </h> </a> with <b>, <d> and <e> being clipped out: <a xml:base="one/two"> <c xml:base="//three/four/./five/./../a.file"/> <f xml:base="//three/four/./five/./../file.xsd"/> ("//three/four/file.xsd") <f1 xml:base="//three/four/./five/./../file.xsd#bare-name"/> <g xml:base="//five/"/> <h xml:base="http://www.iaik.tugraz.at" <http://www.iaik.tugraz.at> > <i xml:base="/aboutus/people/index.php"> <j xml:base="lanz/index.php"> </i> </h> </a> --- Section 2.4 reworded --- 2.4 Document Subsets Some applications require the ability to create a physical representation for an XML document subset (other than the one generated by default, which can be a proper subset of the document if the comments are omitted). Implementations of XML canonicalization that are based on XPath can provide this functionality with little additional overhead by accepting a node-set as input rather than an octet stream. The processing of an element node E MUST be modified slightly when an XPath node-set is given as input and the element's parent (direct ancestor) is omitted from the node-set. This is necessary because omitted nodes SHALL not break the inheritance rules of inheritable attributes defined in the xml namespace. [Definition:] Simple inheritable attributes are attributes that have a value that requires at most a simple redeclaration. This redeclaration is done by supplying a new value in the child axis. The redeclaration of a simple inheritable attribute A contained in an element E is done by supplying a new value to an attribute with the same name contained in a descendant element of E. Simple inheritable attributes are xml:lang and xml:space. The method for processing the attribute axis of an element E in the node-set is enhanced. All element nodes along E's ancestor axis are examined for nearest occurrences of simple inheritable attributes in the xml namespace, such as xml:lang and xml:space (whether or not they are in the node-set). From this list of attributes, remove any simple inheritable attributes that are in E's attribute axis (whether or not they are in the node-set). Then, lexicographically merge this attribute list with the nodes of E's attribute axis that are in the node-set. The result of visiting the attribute axis is computed by processing the attribute nodes in this merged attribute list. The xml:base attribute is not a simple inheritable attribute and requires special processing beyond a simple redeclaration. A "join URI" function is used which takes any URI (uri-1) from an ancestor and joins a relative URI of E (rel-uri-2) (in most cases after the last slash) of the former and then normalizes the result. We describe here a simple method for providing this function. This method uses a separate string buffer in a manner similar to that found in section 5.2 of RFC 2396. Please refer to this source for terms and definitions used in the following Algorithm. 1. If the first URI (uri-1) is null continue with step 2 otherwise copy uri-1 to the buffer. In other words, any characters of uri-1 is copied to the buffer. 2. If the relative URI (rel-uri-2) is null continue with step 5 otherwise if the relative rel-uri-2 starts with a '#' hash or it is the empty string "" continue with step 4 otherwise remove the last segment of the first URI's (uri-1) path component. Anything after the last (right-most) slash character, if any, is removed from the buffer. 3. If the relative URI (rel-uri-2) starts with a '/' slash delete all "<segment>/" from the buffer, where <segment> is a complete path segment. If the URI (rel-uri-2) starts with a '//' two slashes delete further the '//<authority>/'. 4. The relative URI is appended to the buffer string. 5. All occurrences of "./", where "." is a complete path segment, are removed from the buffer string. 6. If the buffer string ends with "." as a complete path segment, that "." is removed. 7. All occurrences of "<segment>/../", where <segment> is a complete path segment not equal to "..", are removed from the buffer string. Removal of these path segments is performed iteratively, removing the leftmost matching pattern on each iteration, until no matching pattern remains. 8. If the buffer string ends with "<segment>/..", where <segment> is a complete path segment not equal to "..", that "<segment>/.." is removed. 9. If the resulting buffer string begins with "<scheme>://<authority>/" followed by one or more complete path segments of "..", then the resulting URI is considered to be in error and Implementations SHOULD indicate this error and fail, if however there are no "<scheme>://<authority>/"> components, the result is a relative URI starting with a relative path. If processing continues implementations MUST handle this by retaining the leading ".." complete path segments in the resulting path (i.e., treating them as part of the final URI e.g. ../../<segment>/<segment> )*. This function may also be called with a null URI, i.e. when no xml:base attribute exists in E (not to be confused with xml:base=""). The method for processing the attribute axis of an element E in the node-set hence needs to be enhanced further. The element nodes along E's ancestor axis are examined for all occurrence of omitted non simple inheritable attributes in the xml namespace (i.e. they are not in the node-set), such as xml:base until their first rendered occurrence exclusive (i.e. this one is in the node-set). Only if such attributes exist E's xml:base attribute will be changed (i.e. added or fixed up). The xml:base attributes selected will be joined by calling the "join URI" function described previously iteratively beginning with the two xml:base attributes closest to the document root until the new value for E's xml:base attribute remains (may also be null). Then, lexicographically merge this fixed up attribute with the nodes of E's attribute axis that are in the node-set. The result of visiting the attribute axis is computed by processing the attribute nodes in this merged attribute list. best regards Konrad Richard Tobin wrote: To fix up the xml:base attribute of an element E: If the base URI of the immediate container of E is known (and is therefore by definition absolute), determine the base URI of E according to xml:base. Set the xml:base attribute to this value. If the base URI of E's container is not known (which can only be the case if the base URI of the document is unknown, and there is no ancestor element with an absolute xml:base attribute), proceed as follows: - if there is no ancestor with an xml:base attribute, leave E's xml:base attribute (if any) unchanged; - if the nearest ancestor with an xml:base is not being omitted, leave E's xml:base attribute (if any) unchanged; - otherwise we must construct an xml:base attribute giving E's base URI relative to the nearest non-omitted ancestor with an xml:base attribute; call this ancestor A). Find the xml:base attributes of the omitted ancestor elements between A and E. Take these in outer-to-inner order, followed by the E's xml:base attribute if it has one. This is a sequence of relative URIs. Discard the last segment - the characters after the last slash - of all but the last of these. If any of these URIs has no slash character, discard it completely. Concatenate the resulting strings, and use this as the xml:base attribute of E. -- Richard -- Konrad Lanz, IAIK/SIC - Graz University of Technology Inffeldgasse 16a, 8010 Graz, Austria Tel: +43 316 873 5547 Fax: +43 316 873 5520 https://www.iaik.tugraz.at/aboutus/people/lanz http://jce.iaik.tugraz.at Certificate chain (including the EuroPKI root certificate): https://europki.iaik.at/ca/europki-at/cert_download.htm
Received on Thursday, 18 May 2006 14:41:49 UTC