- From: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at>
- Date: Wed, 07 Jun 2006 23:29:11 +0200
- To: public-xml-core-wg@w3.org
- Message-ID: <44874527.50901@iaik.tugraz.at>
Dear all, please find below a first draft for the new wording of c14n 1.1. best regards Konrad 2.4 Document Subsets Some applications require the ability to create a physical representation for an XML document subset (other than the one generated by default, which can be a proper subset of the document if the comments are omitted). Implementations of XML canonicalization that are based on XPath can provide this functionality with little additional overhead by accepting a node-set as input rather than an octet stream. The processing of an element node E MUST be modified slightly when an XPath node-set is given as input and the some of the element's ancestors are omitted from the node-set. This is necessary because omitted nodes SHALL not break the inheritance rules of inheritable attributes defined in the xml namespace. [Definition:] Simple inheritable attributes are attributes that have a value that requires at most a simple redeclaration. This redeclaration is done by supplying a new value in the child axis. The redeclaration of a simple inheritable attribute A contained in one of E's ancestors is done by supplying a value to an attribute Ae inside E with the same name. Simple inheritable attributes are xml:lang and xml:space. The method for processing the attribute axis of an element E in the node-set is hence enhanced. All element nodes along E's ancestor axis are examined for the nearest occurrences of simple inheritable attributes in the xml namespace, such as xml:lang and xml:space (whether or not they are in the node-set). From this list of attributes, any simple inheritable attributes that are already in E's attribute axis (whether or not they are in the node-set) are removed. Then, lexicographically merge this attribute list with the nodes of E's attribute axis that are in the node-set. The result of visiting the attribute axis is computed by processing the attribute nodes in this merged attribute list. The xml:base attribute is not a simple inheritable attribute and requires special processing beyond a simple redeclaration. Hence the processing of E's attribute axis needs to be enhanced further. A "join URI" function is used for xml:base fix up, which takes any URI (Base) from an ancestor and joins a relative URI of E (R) (in most cases after the last slash) of the former and then normalizes the result. We describe here a simple method for providing this functionality similar to that found in sections 5.2.1, 5.2.2. and 5.2.4. of RFC 3986 with the following modifications: --- Join URI Begin--- * Perform RFC 3986 section 5.2.1. " Pre-parse the Base URI" modified as follows. - The scheme component is not required in the base URI (Base). (i.e. Base.scheme may be null) * Perform RFC 3986 section 5.2.2. "Transform References" modified as follows to ignore the fragment part of R - After parsing R set R.fragment = null * 5.2.4. "Remove Dot Segments" is modified to keep leading "../" segments and to prevent the erroneous creation of an output that looks like a net path. (seg/.././/pseudo-netpath/seg/file.ext) - several changes as in "Remove Dot Segments" ... (see Apendix) This function may also be called with the URI to be fixed up (R) being null (i.e. when no xml:base attribute exists in E) ore empty "" (xml:base=""). The base URI (Base) may also be unknown in which case the Algorithm is performed with Base.scheme = null, Base.authority = null, Base.path = "" and Base.query = null . --- Join URI End--- Using the "join URI" function xml:base fix up the processing of the attribute axis of an element E in the node-set hence can be enhanced further. The element nodes along E's ancestor axis are now examined for all occurrence non simple inheritable attributes in the xml namespace, such as xml:base, that have been omitted (i.e. they are not in the node-set). This examination is performed until the first rendered occurrence exclusive (i.e. this one is in the node-set). Only if such attributes exist E's xml:base attribute will be changed (i.e. E's xml:base value is fixed up or E's receives an xml:base attribute). The xml:base attributes selected will be fixed up or added by calling the "join URI" function described previously iteratively beginning with the two omitted xml:base attributes closest to the document root until the new value for E's xml:base attribute remains. The result may also be null or empty "" in which case xml:base MUST NOT be rendered. Then, lexicographically merge this fixed up attribute with the nodes of E's attribute axis that are in the node-set. The result of visiting the attribute axis is computed by processing the attribute nodes in this merged attribute list. best regards Konrad P.S.: please review also the following modified "Remove Dot Segments". Appendix * 5.2.4. "Remove Dot Segments" is modified to keep leading "../" segments and to prevent the erroneous creation of an output that looks like a net path. //Editorial-Note: The modified remove_dot_segments Algorithm could go into the Appendix 1. The input buffer is initialized with the now-appended path components and the output buffer is initialized to the empty string. Replace occurrences of "//" in the input buffer with "/" until no more occurrences of "//" are in the input buffer. 2. While the input buffer is not empty, loop as follows: A. If the input buffer begins with a prefix of "./", then remove that prefix from the input buffer, else if the input buffer begins with a prefix of "../" move this prefix to the end of the output buffer; otherwise, B. if the input buffer begins with a prefix of "/./" or "/.", where "." is a complete path segment, then replace that prefix with "/" in the input buffer; otherwise, C. if the input buffer begins with a prefix of "/../" or "/..", where ".." is a complete path segment, then replace that prefix with "/" in the input buffer and also if the last segment in the output buffer equals "../" append "../" to the output buffer else remove the last segment and its preceding "/" (if any) from the output buffer; otherwise, D. if the input buffer consists only of ".", then remove that from the input buffer else if the input buffer consists only of ".." then if the last segment from the output equals "../" append "../" to the output buffer else remove the last segment and its preceding "/" (if any) from the output buffer; otherwise, E. move the first path segment (if any) in the input buffer to the end of the output buffer, including the initial "/" character (if any) and any subsequent characters up to, but not including, the next "/" character or the end of the input buffer. 3. Finally, the output buffer is returned as the result of remove_dot_segments -- Konrad Lanz, IAIK/SIC - Graz University of Technology Inffeldgasse 16a, 8010 Graz, Austria Tel: +43 316 873 5547 Fax: +43 316 873 5520 https://www.iaik.tugraz.at/aboutus/people/lanz http://jce.iaik.tugraz.at Certificate chain (including the EuroPKI root certificate): https://europki.iaik.at/ca/europki-at/cert_download.htm
Received on Wednesday, 7 June 2006 21:29:20 UTC