- From: Jonathan Marsh <jmarsh@microsoft.com>
- Date: Mon, 27 Mar 2000 10:27:08 -0800
- To: "'John Boyer'" <jboyer@PureEdge.com>, "IETF/W3C XML-DSig WG (E-mail)" <w3c-ietf-xmldsig@w3.org>
- Cc: "Martin J. Duerst" <duerst@w3.org>, James Clark <jjc@jclark.com>, Joseph Reagle <reagle@w3.org>, Eastlake Donald-LDE008 <Donald.Eastlake@motorola.com>, TAMURA Kent <kent@trl.ibm.co.jp>, "Christopher R. Maden" <crism@exemplary.net>, Ed Simon <ed.simon@entrust.com>
"However, serialization must be represented as an XPath function since it requires access to the internal representation of a node-set (see parsing requirements)." What does this mean? > -----Original Message----- > From: John Boyer [mailto:jboyer@PureEdge.com] > Sent: Thursday, March 23, 2000 4:15 PM > To: IETF/W3C XML-DSig WG (E-mail) > Cc: Martin J. Duerst; James Clark; Joseph Reagle; Eastlake > Donald-LDE008; TAMURA Kent; Christopher R. Maden; Jonathan Marsh; Ed > Simon > Subject: RE: Enveloped signatures and XPath > > > Attached and Pasted below is the HTML for a new version of the XPath > transform for your consideration. If you are on the cc line, > it is because > you expressed a special interest and/or have provided > constructive and very > helpful feedback on the XPath transform in the recent past. > > Although I'm sure it's not the final draft, I am excited by > the possibility > that we as a group may be close to a sufficient and easy to > understand and > implement version of the XPath transform, so I am asking you > to please take > some time to review the new specification as it is a very > important part of > meeting our partial XML document signing requirement. > > Thanks, > John Boyer > Software Development Manager > PureEdge Solutions, Inc. (formerly UWI.Com) > jboyer@PureEdge.com > > > Executive overview > ================== > > In accordance with group feedback, the following issues have > been addressed > > 1) The parse() function and $input variable binding has been > eliminated. > Instead the root of the input XML document is provided as the > context node > of the initial evaluation context. Certain assumptions about what > information the parser must retain have been expressed, but > all of these > assumptions seem to be necessary to support other > functionality of XPath. > Specifically, I assume that the QName of an element, > attribute or namespace > node can be created using the information available in the > parse tree of any > processor that is bundled with an XPath engine. > > 2) Exact order is eliminated; lex order on input is > eliminated; lex order of > attribute and namespace nodes in the output has been > specified in accordance > with group feedback. > > 3) The namespace declarations are initialized to those > available to the > XPath element containing the Xpath expression, as is done in XPointer. > > 4) Variable bindings for expression byte order mark and > encoding have been > eliminated. Instead we have the assumption that the implementation > translates to the character domain before evaluating the > XPath expression, > which is in accordance with the XPath recommendation. > > 5) The serialize() function has been retained in part to simplify > specification and in part because it needs access to the internal > representation of a node-set. However, note that it is automatically > applied to the XPath transform result, so a) it will almost > never need to be > called explicitly, and b) XPath transform expressions need > not start with a > function call, which seemed to be the source of some concern. > > 6) The output encoding has been standardized to UTF-8. There does not > appear to be a better option. > > 7) Someone mentioned a problem with namespace nodes. There > was something > that we were not addressing, but I did not understand the > comment. If you > are reading, and it was your comment, could you please reiterate and > elaborate. > > ================================================ > > <h4>6.6.3 <a name="sec-XPath">XPath</a> Filtering </h4> > > <dl> > <dt>Identifier: </dt> > <dd>http://www.w3.org/TR/1999/REC-xpath-19991116 </dd> > </dl> > > <p>The <a href="#ref-XPath">XPath</a> transform output is the > result of > applying an > XPath expression to an input string. The XPath expression appears in a > parameter > element named <code>XPath</code>. The input string is > equivalent to the > result > of dereferencing the URI attribute of the > <code>Reference</code> element > containing the > XPath transform, then, in sequence, applying all transforms > that appear > before the XPath > transform in the <code>Reference</code> element's > <code>Transforms</code>.</p> > > <p>The primary purpose of this transform is to ensure that > only specifically > defined > changes to the input XML document are permitted after the signature is > affixed. > The XPath expression can created such that it includes all > elements except > those > meeting specific criteria. It is the responsibility of the > XPath expression > author > to ensure that all necessary information has been included in > the output > such that > modification of the excluded information does not affect the > interpretation > of the > output in the application context. One simple example of this is the > omission of an > enveloped signature's <code>SignatureValue</code> element.</p> > > <h4>6.6.3.1 Evaluation Context Initialization</h4> > > <p>The XPath transform establishes the following evaluation > context for the > XPath expression given in the <code>XPath</code> parameter > element:</p> > > <ul> > <li>A <b>context node</b>, initialized to the input XML > document's root > node.</LI> > <li>A <b>context position</b>, initialized to 1.</LI> > <li>A <b>context size</b>, initialized to 1.</LI> > <li>A <b>library of functions</b> equal to the function set > defined in <a > href="#ref-XPath">XPath</a> > plus the function <a href="#function-serialize">serialize()</a>.</li> > <li>A set of variable bindings. No means for initializing > these is defined. > Thus, the set of > variable bindings used when evaluating the XPath expression > is empty, and > use of a variable > reference in the XPath expression results in an error.</li> > <li>The set of namespace declarations in scope for the XPath > expression.</li> > </ul> > > <h4>6.6.3.2 Parsing Requirements for XPath Evaluation</h4> > > <p>An XML processor is used to read the input XML document > and produce a > parse > tree capable of being used as the initial context node for the XPath > evaluation, as described in the previous section. If the > input is not a > well-formed XML document, then the XPath transform must throw an > exception.</p> > > <p>Validating and non-validating XML processors only behave > in the same way > (e.g. with > respect to attribute value normalization and entity reference > definition) > until an external > reference is encountered. If the XPath transform > implementation uses a > non-validating processor, > and it encounters an external reference in the input document, then an > exception must > be thrown to indicate that the necessary algorithm is > unavailable (The XPath > transform cannot > simply generate incorrect output since many applications > distinguish an > unverifiable > signature from an invalid signature).</p> > > <p>As a result of reading the input with an XML processor, > linefeeds are > normalized, > attribute values are normalized, CDATA sections are replaced by their > content, > and entity references are recursively replaced by > substitution text. In > addition, > consecutive characters are grouped into a single text node.</p> > > <p>The XPath implementation is expected to convert the > information in the > input XML > document and the XPath expression string to the character > domain prior to > making any > comparisons such that the result of evaluating the expression > is equivalent > regardless > of the initial encoding of the input XML document and XPath > expression.</p> > > <p>Based on the namespace processing rules of XPath, the > namespace prefix of > namespace-qualified nodes must be available in the parse tree.</p> > > <p>Based on the expression evaluation requirements of the > XPath function > library, > the <b>document order</b> position of each node must be > available in the > parse tree, > except for the attribute and namespace axes. The XPath > transform imposes no > order > on attribute and namespace nodes during XPath expression > evaluation, and > expressions > based on attribute or namespace node position are not > interoperable. The > XPath > transform does define an order for namespace and attribute > nodes during > <a href="#function-serialize">serialization</a>.</p> > > <h4>6.6.3.3 XPath Transform Functions</h4> > > <p>The function library of the XPath transform includes all functions > defined > by the XPath specification plus the serialize() function defined > below. For most XPath transforms, serialize() need not be > called explicitly > since it > is called automatically if the expression result is a > node-set. However, > serialization > must be represented as an XPath function since it requires > access to the > internal > representation of a node-set (see parsing requirements).</p> > > <p> > <a name="function-serialize"><b>Function: </b><i>string</i> > <b>serialize</b>(<i>node-set</i>)</a> > </p> > > <p>This function converts a node-set into a string by generating the > representative text > for each node in the node-set. The nodes of a node-set are > processed in > ascending order > of the nodes' <b>document order</b> positions except for attribute and > namespace nodes, > which do not have document order positions.</p> > > <p>The nodes in the attribute and namespace axes will each be > processed in > lexicographic order, > with the namespace axis preceding the attribute axis. Lexicographic > comparison is performed using > namespace URI as the primary key and local name as secondary > key (nodes with > no namespace > qualification have an empty namespace URI, which is defined to be > lexicographically least). > Lexicographic comparison is based on the UCS codepoint > values, which is > equivalent to lexical > ordering based on UTF-8.</p> > > <p>The method of text generation is dependent on the node > type and given in > the following list:</p> > > <ul> > <li><b>Root Node-</b> Nothing (no byte order mark, no XML > declaration, no > document > type declaration).</li> > > <li><b>Element Nodes-</b> An open angle bracket (<), the > element QName, > the nodes of the > namespace axis, the nodes of the attribute axis, a close > angle bracket (>), > the descendant > nodes of the element that are in the node-set (in document > order), an open > angle bracket, a > forward slash (/), the element QName, and a close angle bracket. > The element <a > href="http://www.w3.org/TR/REC-xml-names/#NT-QName">QName</a> > is either the > local name if the namespace prefix string is empty or the > namespace prefix > and a colon, > then the local name of the element.</li> > > <li><b>Namespace and Attribute Nodes-</b> a space, the node's > QName, an > equals sign, > an open double quote, the modified string value, and a close > double quote. > The string value of the node is modified by replacing all > ampersands (&) > with <code>&amp;</code>, > and all double quote characters with <code>&quot;</code>, and all > illegal characters for UTF-8 > encoding with hexadecimal character references (e.g. > <code>&#x0D;</code>).</li> > > <li><b>Text Nodes-</b> the string value, except all > ampersands are replaced > by <code>&amp;</code>, > all open angle brackets (<) are replaced by > <code>&lt;</code>, and > all illegal characters > for UTF-8 encoding with hexadecimal character references (e.g. > <code>&#x0D;</code>).</li> > > <li><b>Processing Instruction Nodes-</b> an open angle > bracket, a question > mark, the PI target name > of the node, a space, the string value, the question mark, > and a close angle > bracket.</li> > > <li><b>Comment Nodes-</b> the open comment sequence > (<!--), the string > value of the node, and the close > comment sequence (-->).</li> > </ul> > > <h4 name="sec-XPathTransformOutput">6.6.3.4 XPath Transform > Output</h4> > > <p>The result of the XPath expression is a string, boolean, number, or > node-set. > If the result of the XPath expression is a string, then the > string converted > to > UTF-8 is the output of the XPath transform. If the result is > a boolean or > number, > then the XPath transform output is computed by calling the > XPath string() > function > on the boolean or number then converting to UTF-8. > If the result of the XPath expression is a node-set, then the XPath > transform > result is computed by applying the serialize() function to > the node-set, > then > converting the resulting string to UTF-8.</p> > > <p>For example, consider creating an enveloped signature S1 (a > <code>Signature</code> element > with an <code>id</code> attribute equal to "S1"). The signature S1 is > enveloped because its > <code>Reference</code> URI indicates some ancestor element of > S1. Since the > <code>DigestValue</code> > in the <code>Reference</code> is calculated before S1's > <code>SignatureValue</code>, the > <code>SignatureValue</code> must be omitted from the > <code>DigestValue</code> calculation. > This can be done with an XPath transform containing the > following XPath > expression in its > <code>XPath</code> parameter element:</p> > > <p> <code> > /descendant-or-self::node()[<br/> > not(self::SignatureValue and > parent::Signature[@id="S1"]) and<br/> > not(self::KeyInfo and > parent::Signature[@id="S1"]) and<br/> > not(self::DigestValue and > ancestor::*[3 and > @id="S1"])] > </code> </p> > > <p>The '/descendant-or-self::node()' means that all nodes in > the entire > parse > tree starting at the root node are candidates for the result > node-set. For > each node candidate, > the node is included in the resultant node-set if and only if > the node test > (the boolean expression > in the square brackets) evaluates to "true" for that node. > The node test > returns true for all > nodes except the <code>SignatureValue</code> and > <code>KeyInfo</code> child > elements and the > and the <code>DigestValue</code> descendants of > <code>Signature</code> S1. > Thus, serialize() > returns a string containing the entire input except for > omitting the parts > of S1 that must change > during core processing of S1, so these changes will not invalidate a > <code>DigestValue</code> > computed over the serialize() result.</p> > > <p>Note that this expression works even if the XPath transform is > implemented with a non-validating > processor because S1 is identified by comparison to the value of an > attribute named 'id' rather > than by using the XPath id() function. Although the id() > function is useful > when the 'id' > attribute is not named 'id', the XPath expression author will > know the 'id' > attribute's name when > writing the expression.</p> > > <p>It is RECOMMENDED that the XPath be constructed such that > the result of > this operation > is a well-formed XML document. This should be the case if > root element of > the input > resource is included by the XPath (even if a number of its > descendant nodes > are omitted by the XPath expression). It is also RECOMMENDED > that nodes > should not be > omitted from the input if they affect the interpretation of > the output nodes > in the > application context. The XPath expression author is > responsible for this > since the > XPath expression author knows the application context.</p> > >
Received on Monday, 27 March 2000 13:28:03 UTC