- From: John Boyer <jboyer@PureEdge.com>
- Date: Thu, 23 Mar 2000 16:15:29 -0800
- To: "IETF/W3C XML-DSig WG \(E-mail\)" <w3c-ietf-xmldsig@w3.org>
- Cc: "Martin J. Duerst" <duerst@w3.org>, "James Clark" <jjc@jclark.com>, "Joseph Reagle" <reagle@w3.org>, "Eastlake Donald-LDE008" <Donald.Eastlake@motorola.com>, "TAMURA Kent" <kent@trl.ibm.co.jp>, "Christopher R. Maden" <crism@exemplary.net>, "Jonathan Marsh" <jmarsh@microsoft.com>, "Ed Simon" <ed.simon@entrust.com>
- Message-ID: <BFEDKCINEPLBDLODCODKAEBGCCAA.jboyer@PureEdge.com>
Attached and Pasted below is the HTML for a new version of the XPath transform for your consideration. If you are on the cc line, it is because you expressed a special interest and/or have provided constructive and very helpful feedback on the XPath transform in the recent past. Although I'm sure it's not the final draft, I am excited by the possibility that we as a group may be close to a sufficient and easy to understand and implement version of the XPath transform, so I am asking you to please take some time to review the new specification as it is a very important part of meeting our partial XML document signing requirement. Thanks, John Boyer Software Development Manager PureEdge Solutions, Inc. (formerly UWI.Com) jboyer@PureEdge.com Executive overview ================== In accordance with group feedback, the following issues have been addressed 1) The parse() function and $input variable binding has been eliminated. Instead the root of the input XML document is provided as the context node of the initial evaluation context. Certain assumptions about what information the parser must retain have been expressed, but all of these assumptions seem to be necessary to support other functionality of XPath. Specifically, I assume that the QName of an element, attribute or namespace node can be created using the information available in the parse tree of any processor that is bundled with an XPath engine. 2) Exact order is eliminated; lex order on input is eliminated; lex order of attribute and namespace nodes in the output has been specified in accordance with group feedback. 3) The namespace declarations are initialized to those available to the XPath element containing the Xpath expression, as is done in XPointer. 4) Variable bindings for expression byte order mark and encoding have been eliminated. Instead we have the assumption that the implementation translates to the character domain before evaluating the XPath expression, which is in accordance with the XPath recommendation. 5) The serialize() function has been retained in part to simplify specification and in part because it needs access to the internal representation of a node-set. However, note that it is automatically applied to the XPath transform result, so a) it will almost never need to be called explicitly, and b) XPath transform expressions need not start with a function call, which seemed to be the source of some concern. 6) The output encoding has been standardized to UTF-8. There does not appear to be a better option. 7) Someone mentioned a problem with namespace nodes. There was something that we were not addressing, but I did not understand the comment. If you are reading, and it was your comment, could you please reiterate and elaborate. ================================================ <h4>6.6.3 <a name="sec-XPath">XPath</a> Filtering </h4> <dl> <dt>Identifier: </dt> <dd>http://www.w3.org/TR/1999/REC-xpath-19991116 </dd> </dl> <p>The <a href="#ref-XPath">XPath</a> transform output is the result of applying an XPath expression to an input string. The XPath expression appears in a parameter element named <code>XPath</code>. The input string is equivalent to the result of dereferencing the URI attribute of the <code>Reference</code> element containing the XPath transform, then, in sequence, applying all transforms that appear before the XPath transform in the <code>Reference</code> element's <code>Transforms</code>.</p> <p>The primary purpose of this transform is to ensure that only specifically defined changes to the input XML document are permitted after the signature is affixed. The XPath expression can created such that it includes all elements except those meeting specific criteria. It is the responsibility of the XPath expression author to ensure that all necessary information has been included in the output such that modification of the excluded information does not affect the interpretation of the output in the application context. One simple example of this is the omission of an enveloped signature's <code>SignatureValue</code> element.</p> <h4>6.6.3.1 Evaluation Context Initialization</h4> <p>The XPath transform establishes the following evaluation context for the XPath expression given in the <code>XPath</code> parameter element:</p> <ul> <li>A <b>context node</b>, initialized to the input XML document's root node.</LI> <li>A <b>context position</b>, initialized to 1.</LI> <li>A <b>context size</b>, initialized to 1.</LI> <li>A <b>library of functions</b> equal to the function set defined in <a href="#ref-XPath">XPath</a> plus the function <a href="#function-serialize">serialize()</a>.</li> <li>A set of variable bindings. No means for initializing these is defined. Thus, the set of variable bindings used when evaluating the XPath expression is empty, and use of a variable reference in the XPath expression results in an error.</li> <li>The set of namespace declarations in scope for the XPath expression.</li> </ul> <h4>6.6.3.2 Parsing Requirements for XPath Evaluation</h4> <p>An XML processor is used to read the input XML document and produce a parse tree capable of being used as the initial context node for the XPath evaluation, as described in the previous section. If the input is not a well-formed XML document, then the XPath transform must throw an exception.</p> <p>Validating and non-validating XML processors only behave in the same way (e.g. with respect to attribute value normalization and entity reference definition) until an external reference is encountered. If the XPath transform implementation uses a non-validating processor, and it encounters an external reference in the input document, then an exception must be thrown to indicate that the necessary algorithm is unavailable (The XPath transform cannot simply generate incorrect output since many applications distinguish an unverifiable signature from an invalid signature).</p> <p>As a result of reading the input with an XML processor, linefeeds are normalized, attribute values are normalized, CDATA sections are replaced by their content, and entity references are recursively replaced by substitution text. In addition, consecutive characters are grouped into a single text node.</p> <p>The XPath implementation is expected to convert the information in the input XML document and the XPath expression string to the character domain prior to making any comparisons such that the result of evaluating the expression is equivalent regardless of the initial encoding of the input XML document and XPath expression.</p> <p>Based on the namespace processing rules of XPath, the namespace prefix of namespace-qualified nodes must be available in the parse tree.</p> <p>Based on the expression evaluation requirements of the XPath function library, the <b>document order</b> position of each node must be available in the parse tree, except for the attribute and namespace axes. The XPath transform imposes no order on attribute and namespace nodes during XPath expression evaluation, and expressions based on attribute or namespace node position are not interoperable. The XPath transform does define an order for namespace and attribute nodes during <a href="#function-serialize">serialization</a>.</p> <h4>6.6.3.3 XPath Transform Functions</h4> <p>The function library of the XPath transform includes all functions defined by the XPath specification plus the serialize() function defined below. For most XPath transforms, serialize() need not be called explicitly since it is called automatically if the expression result is a node-set. However, serialization must be represented as an XPath function since it requires access to the internal representation of a node-set (see parsing requirements).</p> <p> <a name="function-serialize"><b>Function: </b><i>string</i> <b>serialize</b>(<i>node-set</i>)</a> </p> <p>This function converts a node-set into a string by generating the representative text for each node in the node-set. The nodes of a node-set are processed in ascending order of the nodes' <b>document order</b> positions except for attribute and namespace nodes, which do not have document order positions.</p> <p>The nodes in the attribute and namespace axes will each be processed in lexicographic order, with the namespace axis preceding the attribute axis. Lexicographic comparison is performed using namespace URI as the primary key and local name as secondary key (nodes with no namespace qualification have an empty namespace URI, which is defined to be lexicographically least). Lexicographic comparison is based on the UCS codepoint values, which is equivalent to lexical ordering based on UTF-8.</p> <p>The method of text generation is dependent on the node type and given in the following list:</p> <ul> <li><b>Root Node-</b> Nothing (no byte order mark, no XML declaration, no document type declaration).</li> <li><b>Element Nodes-</b> An open angle bracket (<), the element QName, the nodes of the namespace axis, the nodes of the attribute axis, a close angle bracket (>), the descendant nodes of the element that are in the node-set (in document order), an open angle bracket, a forward slash (/), the element QName, and a close angle bracket. The element <a href="http://www.w3.org/TR/REC-xml-names/#NT-QName">QName</a> is either the local name if the namespace prefix string is empty or the namespace prefix and a colon, then the local name of the element.</li> <li><b>Namespace and Attribute Nodes-</b> a space, the node's QName, an equals sign, an open double quote, the modified string value, and a close double quote. The string value of the node is modified by replacing all ampersands (&) with <code>&amp;</code>, and all double quote characters with <code>&quot;</code>, and all illegal characters for UTF-8 encoding with hexadecimal character references (e.g. <code>&#x0D;</code>).</li> <li><b>Text Nodes-</b> the string value, except all ampersands are replaced by <code>&amp;</code>, all open angle brackets (<) are replaced by <code>&lt;</code>, and all illegal characters for UTF-8 encoding with hexadecimal character references (e.g. <code>&#x0D;</code>).</li> <li><b>Processing Instruction Nodes-</b> an open angle bracket, a question mark, the PI target name of the node, a space, the string value, the question mark, and a close angle bracket.</li> <li><b>Comment Nodes-</b> the open comment sequence (<!--), the string value of the node, and the close comment sequence (-->).</li> </ul> <h4 name="sec-XPathTransformOutput">6.6.3.4 XPath Transform Output</h4> <p>The result of the XPath expression is a string, boolean, number, or node-set. If the result of the XPath expression is a string, then the string converted to UTF-8 is the output of the XPath transform. If the result is a boolean or number, then the XPath transform output is computed by calling the XPath string() function on the boolean or number then converting to UTF-8. If the result of the XPath expression is a node-set, then the XPath transform result is computed by applying the serialize() function to the node-set, then converting the resulting string to UTF-8.</p> <p>For example, consider creating an enveloped signature S1 (a <code>Signature</code> element with an <code>id</code> attribute equal to "S1"). The signature S1 is enveloped because its <code>Reference</code> URI indicates some ancestor element of S1. Since the <code>DigestValue</code> in the <code>Reference</code> is calculated before S1's <code>SignatureValue</code>, the <code>SignatureValue</code> must be omitted from the <code>DigestValue</code> calculation. This can be done with an XPath transform containing the following XPath expression in its <code>XPath</code> parameter element:</p> <p> <code> /descendant-or-self::node()[<br/> not(self::SignatureValue and parent::Signature[@id="S1"]) and<br/> not(self::KeyInfo and parent::Signature[@id="S1"]) and<br/> not(self::DigestValue and ancestor::*[3 and @id="S1"])] </code> </p> <p>The '/descendant-or-self::node()' means that all nodes in the entire parse tree starting at the root node are candidates for the result node-set. For each node candidate, the node is included in the resultant node-set if and only if the node test (the boolean expression in the square brackets) evaluates to "true" for that node. The node test returns true for all nodes except the <code>SignatureValue</code> and <code>KeyInfo</code> child elements and the and the <code>DigestValue</code> descendants of <code>Signature</code> S1. Thus, serialize() returns a string containing the entire input except for omitting the parts of S1 that must change during core processing of S1, so these changes will not invalidate a <code>DigestValue</code> computed over the serialize() result.</p> <p>Note that this expression works even if the XPath transform is implemented with a non-validating processor because S1 is identified by comparison to the value of an attribute named 'id' rather than by using the XPath id() function. Although the id() function is useful when the 'id' attribute is not named 'id', the XPath expression author will know the 'id' attribute's name when writing the expression.</p> <p>It is RECOMMENDED that the XPath be constructed such that the result of this operation is a well-formed XML document. This should be the case if root element of the input resource is included by the XPath (even if a number of its descendant nodes are omitted by the XPath expression). It is also RECOMMENDED that nodes should not be omitted from the input if they affect the interpretation of the output nodes in the application context. The XPath expression author is responsible for this since the XPath expression author knows the application context.</p>
Attachments
- text/html attachment: transforms2.htm
Received on Thursday, 23 March 2000 19:13:50 UTC