c14n xml:base, draft for a solution from Konrad Lanz on 2006-06-07 (public-xml-core-wg@w3.org from June 2006)

From: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at>
Date: Wed, 07 Jun 2006 23:29:11 +0200
To: public-xml-core-wg@w3.org
Message-ID: <44874527.50901@iaik.tugraz.at>
Dear all,

please find below a first draft for the new wording of c14n 1.1.

best regards
Konrad

2.4 Document Subsets

    Some applications require the ability to create a physical 
representation for an XML document subset (other than the one generated 
by default, which can be a proper subset of the document if the comments 
are omitted). Implementations of XML canonicalization that are based on 
XPath can provide this functionality with little additional overhead by 
accepting a node-set as input rather than an octet stream. The 
processing of an element node E MUST be modified slightly when an XPath 
node-set is given as input and the some of the element's ancestors are 
omitted from the node-set. This is necessary because omitted nodes SHALL 
not break the inheritance rules of inheritable attributes defined in the 
xml namespace.

[Definition:] Simple inheritable attributes are attributes that have a 
value that requires at most a simple redeclaration. This redeclaration 
is done by supplying a new value in the child axis. The redeclaration of 
a simple inheritable attribute A contained in one of E's ancestors is 
done by supplying a value to an attribute Ae inside E with the same 
name. Simple inheritable attributes are xml:lang and xml:space.

    The method for processing the attribute axis of an element E in the 
node-set is hence enhanced. All element nodes along E's ancestor axis 
are examined for the nearest occurrences of simple inheritable 
attributes in the xml namespace, such as xml:lang and xml:space (whether 
or not they are in the node-set). From this list of attributes, any 
simple inheritable attributes that are already in E's attribute axis 
(whether or not they are in the node-set) are removed. Then, 
lexicographically merge this attribute list with the nodes of E's 
attribute axis that are in the node-set. The result of visiting the 
attribute axis is computed by processing the attribute nodes in this 
merged attribute list.

    The xml:base attribute is not a simple inheritable attribute and 
requires special processing beyond a simple redeclaration. Hence the 
processing of E's attribute axis needs to be enhanced further. A "join 
URI" function is used for xml:base fix up, which takes any URI (Base) 
from an ancestor and joins a relative URI of E (R) (in most cases after 
the last slash) of the former and then normalizes the result. We 
describe here a simple method for providing this functionality similar 
to that found in sections 5.2.1, 5.2.2. and 5.2.4. of RFC 3986 with the 
following modifications:

--- Join URI Begin---

    * Perform RFC 3986 section 5.2.1. " Pre-parse the Base URI" modified 
as follows.

      - The scheme component is not required in the base URI (Base). 
(i.e. Base.scheme may be null)

    * Perform RFC 3986 section 5.2.2. "Transform References" modified as 
follows to ignore the fragment part of R

        - After parsing R set R.fragment = null

    * 5.2.4. "Remove Dot Segments" is modified to keep leading "../" 
segments and to prevent the erroneous creation of an output that looks 
like a net path. (seg/.././/pseudo-netpath/seg/file.ext)

        - several changes as in "Remove Dot Segments" ... (see Apendix)

This function may also be called with the URI to be fixed up (R) being 
null (i.e. when no xml:base attribute exists in E) ore empty "" 
(xml:base="").
The base URI (Base) may also be unknown in which case the Algorithm is 
performed with Base.scheme = null, Base.authority = null, Base.path = "" 
and Base.query = null .

--- Join URI End---
 
Using the "join URI" function xml:base fix up the processing of the 
attribute axis of an element E in the node-set hence can be enhanced 
further.
The element nodes along E's ancestor axis are now examined for all 
occurrence non simple inheritable attributes in the xml namespace, such 
as xml:base, that have been omitted (i.e. they are not in the node-set). 
This examination is performed until the first rendered occurrence 
exclusive (i.e. this one is in the node-set). Only if such attributes 
exist E's xml:base attribute will be changed (i.e. E's xml:base value is 
fixed up or E's receives an xml:base attribute).  The xml:base 
attributes selected will be fixed up or added by calling the "join URI" 
function described previously iteratively beginning with the two omitted 
xml:base attributes closest to the document root until the new value for 
E's xml:base attribute remains. The result may also be null or empty "" 
in which case xml:base MUST NOT be rendered.

Then, lexicographically merge this fixed up attribute with the nodes of 
E's attribute axis that are in the node-set. The result of visiting the 
attribute axis is computed by processing the attribute nodes in this 
merged attribute list.


best regards

Konrad

P.S.: please review also the following modified "Remove Dot Segments".

Appendix

    * 5.2.4. "Remove Dot Segments" is modified to keep leading "../" 
segments and to prevent the erroneous creation of an output that looks 
like a net path.
       //Editorial-Note: The modified remove_dot_segments Algorithm 
could go into the Appendix

       1.  The input buffer is initialized with the now-appended path
           components and the output buffer is initialized to the empty
           string. Replace occurrences of "//" in the input buffer with "/"
           until no more occurrences of "//" are in the input buffer.

       2.  While the input buffer is not empty, loop as follows:

           A.  If the input buffer begins with a prefix of "./", then remove
               that prefix from the input buffer, else if the input buffer
               begins with a prefix of "../" move this prefix to the end of
               the output buffer; otherwise,

           B.  if the input buffer begins with a prefix of "/./" or "/.",
               where "." is a complete path segment, then replace that
               prefix with "/" in the input buffer; otherwise,

           C.  if the input buffer begins with a prefix of "/../" or "/..",
               where ".." is a complete path segment, then replace that
               prefix with "/" in the input buffer and also if the last
               segment in the output buffer equals "../" append "../" to 
the
               output buffer else remove the last segment and its preceding
               "/" (if any) from the output buffer; otherwise,

           D.  if the input buffer consists only of ".", then remove
               that from the input buffer else if the input buffer consists
               only of ".." then if the last segment from the output equals
               "../" append "../" to the output buffer else remove the last
               segment and its preceding "/" (if any) from the output 
buffer;
               otherwise,

           E.  move the first path segment (if any) in the input buffer 
to the
               end of the output buffer, including the initial "/" character
               (if any) and any subsequent characters up to, but not 
including,
               the next "/" character or the end of the input buffer.


       3.  Finally, the output buffer is returned as the result of
           remove_dot_segments

-- 
Konrad Lanz, IAIK/SIC - Graz University of Technology
Inffeldgasse 16a, 8010 Graz, Austria
Tel: +43 316 873 5547
Fax: +43 316 873 5520
https://www.iaik.tugraz.at/aboutus/people/lanz
http://jce.iaik.tugraz.at

Certificate chain (including the EuroPKI root certificate):
https://europki.iaik.at/ca/europki-at/cert_download.htm
Received on Wednesday, 7 June 2006 21:29:20 UTC