RE: Enveloped signatures and XPath from Jonathan Marsh on 2000-03-27 (w3c-ietf-xmldsig@w3.org from January to March 2000)

From: Jonathan Marsh <jmarsh@microsoft.com>
Date: Mon, 27 Mar 2000 10:27:08 -0800
To: "'John Boyer'" <jboyer@PureEdge.com>, "IETF/W3C XML-DSig WG (E-mail)" <w3c-ietf-xmldsig@w3.org>
Cc: "Martin J. Duerst" <duerst@w3.org>, James Clark <jjc@jclark.com>, Joseph Reagle <reagle@w3.org>, Eastlake Donald-LDE008 <Donald.Eastlake@motorola.com>, TAMURA Kent <kent@trl.ibm.co.jp>, "Christopher R. Maden" <crism@exemplary.net>, Ed Simon <ed.simon@entrust.com>
Message-ID: <5F68209F7E4BD111A5F500805FFE35B91D3FDED2@RED-MSG-54>
"However, serialization must be represented as an XPath function since it
requires access to the internal representation of a node-set (see parsing
requirements)."

What does this mean?

> -----Original Message-----
> From: John Boyer [mailto:jboyer@PureEdge.com]
> Sent: Thursday, March 23, 2000 4:15 PM
> To: IETF/W3C XML-DSig WG (E-mail)
> Cc: Martin J. Duerst; James Clark; Joseph Reagle; Eastlake
> Donald-LDE008; TAMURA Kent; Christopher R. Maden; Jonathan Marsh; Ed
> Simon
> Subject: RE: Enveloped signatures and XPath
> 
> 
> Attached and Pasted below is the HTML for a new version of the XPath
> transform for your consideration.  If you are on the cc line, 
> it is because
> you expressed a special interest and/or have provided 
> constructive and very
> helpful feedback on the XPath transform in the recent past.
> 
> Although I'm sure it's not the final draft, I am excited by 
> the possibility
> that we as a group may be close to a sufficient and easy to 
> understand and
> implement version of the XPath transform, so I am asking you 
> to please take
> some time to review the new specification as it is a very 
> important part of
> meeting our partial XML document signing requirement.
> 
> Thanks,
> John Boyer
> Software Development Manager
> PureEdge Solutions, Inc. (formerly UWI.Com)
> jboyer@PureEdge.com
> 
> 
> Executive overview
> ==================
> 
> In accordance with group feedback, the following issues have 
> been addressed
> 
> 1) The parse() function and $input variable binding has been 
> eliminated.
> Instead the root of the input XML document is provided as the 
> context node
> of the initial evaluation context.  Certain assumptions about what
> information the parser must retain have been expressed, but 
> all of these
> assumptions seem to be necessary to support other 
> functionality of XPath.
> Specifically, I assume that the QName of an element, 
> attribute or namespace
> node can be created using the information available in the 
> parse tree of any
> processor that is bundled with an XPath engine.
> 
> 2) Exact order is eliminated; lex order on input is 
> eliminated; lex order of
> attribute and namespace nodes in the output has been 
> specified in accordance
> with group feedback.
> 
> 3) The namespace declarations are initialized to those 
> available to the
> XPath element containing the Xpath expression, as is done in XPointer.
> 
> 4) Variable bindings for expression byte order mark and 
> encoding have been
> eliminated.  Instead we have the assumption that the implementation
> translates to the character domain before evaluating the 
> XPath expression,
> which is in accordance with the XPath recommendation.
> 
> 5) The serialize() function has been retained in part to simplify
> specification and in part because it needs access to the internal
> representation of a node-set.  However, note that it is automatically
> applied to the XPath transform result, so a) it will almost 
> never need to be
> called explicitly, and b) XPath transform expressions need 
> not start with a
> function call, which seemed to be the source of some concern.
> 
> 6) The output encoding has been standardized to UTF-8.  There does not
> appear to be a better option.
> 
> 7) Someone mentioned a problem with namespace nodes.  There 
> was something
> that we were not addressing, but I did not understand the 
> comment.  If you
> are reading, and it was your comment, could you please reiterate and
> elaborate.
> 
> ================================================
> 
> <h4>6.6.3 <a name="sec-XPath">XPath</a> Filtering </h4>
> 
> <dl>
>   <dt>Identifier: </dt>
>   <dd>http://www.w3.org/TR/1999/REC-xpath-19991116 </dd>
> </dl>
> 
> <p>The <a href="#ref-XPath">XPath</a> transform output is the 
> result of
> applying an
> XPath expression to an input string. The XPath expression appears in a
> parameter
> element named <code>XPath</code>. The input string is 
> equivalent to the
> result
> of dereferencing the URI attribute of the 
> <code>Reference</code> element
> containing the
> XPath transform, then, in sequence, applying all transforms 
> that appear
> before the XPath
> transform in the <code>Reference</code> element's
> <code>Transforms</code>.</p>
> 
> <p>The primary purpose of this transform is to ensure that 
> only specifically
> defined
> changes to the input XML document are permitted after the signature is
> affixed.
> The XPath expression can created such that it includes all 
> elements except
> those
> meeting specific criteria.  It is the responsibility of the 
> XPath expression
> author
> to ensure that all necessary information has been included in 
> the output
> such that
> modification of the excluded information does not affect the 
> interpretation
> of the
> output in the application context.  One simple example of this is the
> omission of an
> enveloped signature's <code>SignatureValue</code> element.</p>
> 
> <h4>6.6.3.1 Evaluation Context Initialization</h4>
> 
> <p>The XPath transform establishes the following evaluation 
> context for the
> XPath expression given in the <code>XPath</code> parameter 
> element:</p>
> 
> <ul>
> <li>A <b>context node</b>, initialized to the input XML 
> document's root
> node.</LI>
> <li>A <b>context position</b>, initialized to 1.</LI>
> <li>A <b>context size</b>, initialized to 1.</LI>
> <li>A <b>library of functions</b> equal to the function set 
> defined in <a
> href="#ref-XPath">XPath</a>
> plus the function <a href="#function-serialize">serialize()</a>.</li>
> <li>A set of variable bindings. No means for initializing 
> these is defined.
> Thus, the set of
> variable bindings used when evaluating the XPath expression 
> is empty, and
> use of a variable
> reference in the XPath expression results in an error.</li>
> <li>The set of namespace declarations in scope for the XPath
> expression.</li>
> </ul>
> 
> <h4>6.6.3.2 Parsing Requirements for XPath Evaluation</h4>
> 
> <p>An XML processor is used to read the input XML document 
> and produce a
> parse
> tree capable of being used as the initial context node for the XPath
> evaluation, as described in the previous section.  If the 
> input is not a
> well-formed XML document, then the XPath transform must throw an
> exception.</p>
> 
> <p>Validating and non-validating XML processors only behave 
> in the same way
> (e.g. with
> respect to attribute value normalization and entity reference 
> definition)
> until an external
> reference is encountered.  If the XPath transform 
> implementation uses a
> non-validating processor,
> and it encounters an external reference in the input document, then an
> exception must
> be thrown to indicate that the necessary algorithm is 
> unavailable (The XPath
> transform cannot
> simply generate incorrect output since many applications 
> distinguish an
> unverifiable
> signature from an invalid signature).</p>
> 
> <p>As a result of reading the input with an XML processor, 
> linefeeds are
> normalized,
> attribute values are normalized, CDATA sections are replaced by their
> content,
> and entity references are recursively replaced by 
> substitution text.  In
> addition,
> consecutive characters are grouped into a single text node.</p>
> 
> <p>The XPath implementation is expected to convert the 
> information in the
> input XML
> document and the XPath expression string to the character 
> domain prior to
> making any
> comparisons such that the result of evaluating the expression 
> is equivalent
> regardless
> of the initial encoding of the input XML document and XPath 
> expression.</p>
> 
> <p>Based on the namespace processing rules of XPath, the 
> namespace prefix of
> namespace-qualified nodes must be available in the parse tree.</p>
> 
> <p>Based on the expression evaluation requirements of the 
> XPath function
> library,
> the <b>document order</b> position of each node must be 
> available in the
> parse tree,
> except for the attribute and namespace axes.  The XPath 
> transform imposes no
> order
> on attribute and namespace nodes during XPath expression 
> evaluation, and
> expressions
> based on attribute or namespace node position are not 
> interoperable.  The
> XPath
> transform does define an order for namespace and attribute 
> nodes during
> <a href="#function-serialize">serialization</a>.</p>
> 
> <h4>6.6.3.3 XPath Transform Functions</h4>
> 
> <p>The function library of the XPath transform includes all functions
> defined
> by the XPath specification plus the serialize() function defined
> below.  For most XPath transforms, serialize() need not be 
> called explicitly
> since it
> is called automatically if the expression result is a 
> node-set.  However,
> serialization
> must be represented as an XPath function since it requires 
> access to the
> internal
> representation of a node-set (see parsing requirements).</p>
> 
> <p>
> <a name="function-serialize"><b>Function: </b><i>string</i>
> <b>serialize</b>(<i>node-set</i>)</a>
> </p>
> 
> <p>This function converts a node-set into a string by generating the
> representative text
> for each node in the node-set.  The nodes of a node-set are 
> processed in
> ascending order
> of the nodes' <b>document order</b> positions except for attribute and
> namespace nodes,
> which do not have document order positions.</p>
> 
> <p>The nodes in the attribute and namespace axes will each be 
> processed in
> lexicographic order,
> with the namespace axis preceding the attribute axis.  Lexicographic
> comparison is performed using
> namespace URI as the primary key and local name as secondary 
> key (nodes with
> no namespace
> qualification have an empty namespace URI, which is defined to be
> lexicographically least).
> Lexicographic comparison is based on the UCS codepoint 
> values, which is
> equivalent to lexical
> ordering based on UTF-8.</p>
> 
> <p>The method of text generation is dependent on the node 
> type and given in
> the following list:</p>
> 
> <ul>
> <li><b>Root Node-</b> Nothing (no byte order mark, no XML 
> declaration, no
> document
> type declaration).</li>
> 
> <li><b>Element Nodes-</b> An open angle bracket (&lt;), the 
> element QName,
> the nodes of the
> namespace axis, the nodes of the attribute axis, a close 
> angle bracket (>),
> the descendant
> nodes of the element that are in the node-set (in document 
> order), an open
> angle bracket, a
> forward slash (/), the element QName, and a close angle bracket.
> The element <a 
> href="http://www.w3.org/TR/REC-xml-names/#NT-QName">QName</a>
> is either the
> local name if the namespace prefix string is empty or the 
> namespace prefix
> and a colon,
> then the local name of the element.</li>
> 
> <li><b>Namespace and Attribute Nodes-</b> a space, the node's 
> QName, an
> equals sign,
> an open double quote, the modified string value, and a close 
> double quote.
> The string value of the node is modified by replacing all 
> ampersands (&amp;)
> with <code>&amp;amp;</code>,
> and all double quote characters with <code>&amp;quot;</code>, and all
> illegal characters for UTF-8
> encoding with hexadecimal character references (e.g.
> <code>&amp;#x0D;</code>).</li>
> 
> <li><b>Text Nodes-</b> the string value, except all 
> ampersands are replaced
> by <code>&amp;amp;</code>,
> all open angle brackets (&lt;) are replaced by 
> <code>&amp;lt;</code>, and
> all illegal characters
> for UTF-8 encoding with hexadecimal character references (e.g.
> <code>&amp;#x0D;</code>).</li>
> 
> <li><b>Processing Instruction Nodes-</b> an open angle 
> bracket, a question
> mark, the PI target name
> of the node, a space, the string value, the question mark, 
> and a close angle
> bracket.</li>
> 
> <li><b>Comment Nodes-</b> the open comment sequence 
> (&lt;!--), the string
> value of the node, and the close
> comment sequence (-->).</li>
> </ul>
> 
> <h4 name="sec-XPathTransformOutput">6.6.3.4 XPath Transform 
> Output</h4>
> 
> <p>The result of the XPath expression is a string, boolean, number, or
> node-set.
> If the result of the XPath expression is a string, then the 
> string converted
> to
> UTF-8 is the output of the XPath transform. If the result is 
> a boolean or
> number,
> then the XPath transform output is computed by calling the 
> XPath string()
> function
> on the boolean or number then converting to UTF-8.
> If the result of the XPath expression is a node-set, then the XPath
> transform
> result is computed by applying the serialize() function to 
> the node-set,
> then
> converting the resulting string to UTF-8.</p>
> 
> <p>For example, consider creating an enveloped signature S1 (a
> <code>Signature</code> element
> with an <code>id</code> attribute equal to "S1").  The signature S1 is
> enveloped because its
> <code>Reference</code> URI indicates some ancestor element of 
> S1. Since the
> <code>DigestValue</code>
> in the <code>Reference</code> is calculated before S1's
> <code>SignatureValue</code>, the
> <code>SignatureValue</code> must be omitted from the
> <code>DigestValue</code> calculation.
> This can be done with an XPath transform containing the 
> following XPath
> expression in its
> <code>XPath</code> parameter element:</p>
> 
> <p> <code>
> /descendant-or-self::node()[<br/>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;not(self::SignatureValue and
> parent::Signature[@id="S1"]) and<br/>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;not(self::KeyInfo and
> parent::Signature[@id="S1"]) and<br/>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;not(self::DigestValue and 
> ancestor::*[3 and
> @id="S1"])]
> </code> </p>
> 
> <p>The '/descendant-or-self::node()' means that all nodes in 
> the entire
> parse
> tree starting at the root node are candidates for the result 
> node-set.  For
> each node candidate,
> the node is included in the resultant node-set if and only if 
> the node test
> (the boolean expression
> in the square brackets) evaluates to "true" for that node.  
> The node test
> returns true for all
> nodes except the <code>SignatureValue</code> and 
> <code>KeyInfo</code> child
> elements and the
> and the <code>DigestValue</code> descendants of 
> <code>Signature</code> S1.
> Thus, serialize()
> returns a string containing the entire input except for 
> omitting the parts
> of S1 that must change
> during core processing of S1, so these changes will not invalidate a
> <code>DigestValue</code>
> computed over the serialize() result.</p>
> 
> <p>Note that this expression works even if the XPath transform is
> implemented with a non-validating
> processor because S1 is identified by comparison to the value of an
> attribute named 'id' rather
> than by using the XPath id() function.  Although the id() 
> function is useful
> when the 'id'
> attribute is not named 'id', the XPath expression author will 
> know the 'id'
> attribute's name when
> writing the expression.</p>
> 
> <p>It is RECOMMENDED that the XPath be constructed such that 
> the result of
> this operation
> is a well-formed XML document. This should be the case if 
> root element of
> the input
> resource is included by the XPath (even if a number of its 
> descendant nodes
> are omitted by the XPath expression). It is also RECOMMENDED 
> that nodes
> should not be
> omitted from the input if they affect the interpretation of 
> the output nodes
> in the
> application context.  The XPath expression author is 
> responsible for this
> since the
> XPath expression author knows the application context.</p>
> 
>
Received on Monday, 27 March 2000 13:28:03 UTC