- From: Konrad Lanz <Konrad.Lanz@iaik.tugraz.at>
- Date: Wed, 07 Jun 2006 23:29:11 +0200
- To: public-xml-core-wg@w3.org
- Message-ID: <44874527.50901@iaik.tugraz.at>
Dear all,
please find below a first draft for the new wording of c14n 1.1.
best regards
Konrad
2.4 Document Subsets
Some applications require the ability to create a physical
representation for an XML document subset (other than the one generated
by default, which can be a proper subset of the document if the comments
are omitted). Implementations of XML canonicalization that are based on
XPath can provide this functionality with little additional overhead by
accepting a node-set as input rather than an octet stream. The
processing of an element node E MUST be modified slightly when an XPath
node-set is given as input and the some of the element's ancestors are
omitted from the node-set. This is necessary because omitted nodes SHALL
not break the inheritance rules of inheritable attributes defined in the
xml namespace.
[Definition:] Simple inheritable attributes are attributes that have a
value that requires at most a simple redeclaration. This redeclaration
is done by supplying a new value in the child axis. The redeclaration of
a simple inheritable attribute A contained in one of E's ancestors is
done by supplying a value to an attribute Ae inside E with the same
name. Simple inheritable attributes are xml:lang and xml:space.
The method for processing the attribute axis of an element E in the
node-set is hence enhanced. All element nodes along E's ancestor axis
are examined for the nearest occurrences of simple inheritable
attributes in the xml namespace, such as xml:lang and xml:space (whether
or not they are in the node-set). From this list of attributes, any
simple inheritable attributes that are already in E's attribute axis
(whether or not they are in the node-set) are removed. Then,
lexicographically merge this attribute list with the nodes of E's
attribute axis that are in the node-set. The result of visiting the
attribute axis is computed by processing the attribute nodes in this
merged attribute list.
The xml:base attribute is not a simple inheritable attribute and
requires special processing beyond a simple redeclaration. Hence the
processing of E's attribute axis needs to be enhanced further. A "join
URI" function is used for xml:base fix up, which takes any URI (Base)
from an ancestor and joins a relative URI of E (R) (in most cases after
the last slash) of the former and then normalizes the result. We
describe here a simple method for providing this functionality similar
to that found in sections 5.2.1, 5.2.2. and 5.2.4. of RFC 3986 with the
following modifications:
--- Join URI Begin---
* Perform RFC 3986 section 5.2.1. " Pre-parse the Base URI" modified
as follows.
- The scheme component is not required in the base URI (Base).
(i.e. Base.scheme may be null)
* Perform RFC 3986 section 5.2.2. "Transform References" modified as
follows to ignore the fragment part of R
- After parsing R set R.fragment = null
* 5.2.4. "Remove Dot Segments" is modified to keep leading "../"
segments and to prevent the erroneous creation of an output that looks
like a net path. (seg/.././/pseudo-netpath/seg/file.ext)
- several changes as in "Remove Dot Segments" ... (see Apendix)
This function may also be called with the URI to be fixed up (R) being
null (i.e. when no xml:base attribute exists in E) ore empty ""
(xml:base="").
The base URI (Base) may also be unknown in which case the Algorithm is
performed with Base.scheme = null, Base.authority = null, Base.path = ""
and Base.query = null .
--- Join URI End---
Using the "join URI" function xml:base fix up the processing of the
attribute axis of an element E in the node-set hence can be enhanced
further.
The element nodes along E's ancestor axis are now examined for all
occurrence non simple inheritable attributes in the xml namespace, such
as xml:base, that have been omitted (i.e. they are not in the node-set).
This examination is performed until the first rendered occurrence
exclusive (i.e. this one is in the node-set). Only if such attributes
exist E's xml:base attribute will be changed (i.e. E's xml:base value is
fixed up or E's receives an xml:base attribute). The xml:base
attributes selected will be fixed up or added by calling the "join URI"
function described previously iteratively beginning with the two omitted
xml:base attributes closest to the document root until the new value for
E's xml:base attribute remains. The result may also be null or empty ""
in which case xml:base MUST NOT be rendered.
Then, lexicographically merge this fixed up attribute with the nodes of
E's attribute axis that are in the node-set. The result of visiting the
attribute axis is computed by processing the attribute nodes in this
merged attribute list.
best regards
Konrad
P.S.: please review also the following modified "Remove Dot Segments".
Appendix
* 5.2.4. "Remove Dot Segments" is modified to keep leading "../"
segments and to prevent the erroneous creation of an output that looks
like a net path.
//Editorial-Note: The modified remove_dot_segments Algorithm
could go into the Appendix
1. The input buffer is initialized with the now-appended path
components and the output buffer is initialized to the empty
string. Replace occurrences of "//" in the input buffer with "/"
until no more occurrences of "//" are in the input buffer.
2. While the input buffer is not empty, loop as follows:
A. If the input buffer begins with a prefix of "./", then remove
that prefix from the input buffer, else if the input buffer
begins with a prefix of "../" move this prefix to the end of
the output buffer; otherwise,
B. if the input buffer begins with a prefix of "/./" or "/.",
where "." is a complete path segment, then replace that
prefix with "/" in the input buffer; otherwise,
C. if the input buffer begins with a prefix of "/../" or "/..",
where ".." is a complete path segment, then replace that
prefix with "/" in the input buffer and also if the last
segment in the output buffer equals "../" append "../" to
the
output buffer else remove the last segment and its preceding
"/" (if any) from the output buffer; otherwise,
D. if the input buffer consists only of ".", then remove
that from the input buffer else if the input buffer consists
only of ".." then if the last segment from the output equals
"../" append "../" to the output buffer else remove the last
segment and its preceding "/" (if any) from the output
buffer;
otherwise,
E. move the first path segment (if any) in the input buffer
to the
end of the output buffer, including the initial "/" character
(if any) and any subsequent characters up to, but not
including,
the next "/" character or the end of the input buffer.
3. Finally, the output buffer is returned as the result of
remove_dot_segments
--
Konrad Lanz, IAIK/SIC - Graz University of Technology
Inffeldgasse 16a, 8010 Graz, Austria
Tel: +43 316 873 5547
Fax: +43 316 873 5520
https://www.iaik.tugraz.at/aboutus/people/lanz
http://jce.iaik.tugraz.at
Certificate chain (including the EuroPKI root certificate):
https://europki.iaik.at/ca/europki-at/cert_download.htm
Received on Wednesday, 7 June 2006 21:29:20 UTC