RE: Enveloped signatures and XPath from John Boyer on 2000-03-24 (w3c-ietf-xmldsig@w3.org from January to March 2000)

From: John Boyer <jboyer@PureEdge.com>
Date: Thu, 23 Mar 2000 16:15:29 -0800
To: "IETF/W3C XML-DSig WG \(E-mail\)" <w3c-ietf-xmldsig@w3.org>
Cc: "Martin J. Duerst" <duerst@w3.org>, "James Clark" <jjc@jclark.com>, "Joseph Reagle" <reagle@w3.org>, "Eastlake Donald-LDE008" <Donald.Eastlake@motorola.com>, "TAMURA Kent" <kent@trl.ibm.co.jp>, "Christopher R. Maden" <crism@exemplary.net>, "Jonathan Marsh" <jmarsh@microsoft.com>, "Ed Simon" <ed.simon@entrust.com>
Message-ID: <BFEDKCINEPLBDLODCODKAEBGCCAA.jboyer@PureEdge.com>
Attached and Pasted below is the HTML for a new version of the XPath
transform for your consideration.  If you are on the cc line, it is because
you expressed a special interest and/or have provided constructive and very
helpful feedback on the XPath transform in the recent past.

Although I'm sure it's not the final draft, I am excited by the possibility
that we as a group may be close to a sufficient and easy to understand and
implement version of the XPath transform, so I am asking you to please take
some time to review the new specification as it is a very important part of
meeting our partial XML document signing requirement.

Thanks,
John Boyer
Software Development Manager
PureEdge Solutions, Inc. (formerly UWI.Com)
jboyer@PureEdge.com


Executive overview
==================

In accordance with group feedback, the following issues have been addressed

1) The parse() function and $input variable binding has been eliminated.
Instead the root of the input XML document is provided as the context node
of the initial evaluation context.  Certain assumptions about what
information the parser must retain have been expressed, but all of these
assumptions seem to be necessary to support other functionality of XPath.
Specifically, I assume that the QName of an element, attribute or namespace
node can be created using the information available in the parse tree of any
processor that is bundled with an XPath engine.

2) Exact order is eliminated; lex order on input is eliminated; lex order of
attribute and namespace nodes in the output has been specified in accordance
with group feedback.

3) The namespace declarations are initialized to those available to the
XPath element containing the Xpath expression, as is done in XPointer.

4) Variable bindings for expression byte order mark and encoding have been
eliminated.  Instead we have the assumption that the implementation
translates to the character domain before evaluating the XPath expression,
which is in accordance with the XPath recommendation.

5) The serialize() function has been retained in part to simplify
specification and in part because it needs access to the internal
representation of a node-set.  However, note that it is automatically
applied to the XPath transform result, so a) it will almost never need to be
called explicitly, and b) XPath transform expressions need not start with a
function call, which seemed to be the source of some concern.

6) The output encoding has been standardized to UTF-8.  There does not
appear to be a better option.

7) Someone mentioned a problem with namespace nodes.  There was something
that we were not addressing, but I did not understand the comment.  If you
are reading, and it was your comment, could you please reiterate and
elaborate.

================================================

<h4>6.6.3 <a name="sec-XPath">XPath</a> Filtering </h4>

<dl>
  <dt>Identifier: </dt>
  <dd>http://www.w3.org/TR/1999/REC-xpath-19991116 </dd>
</dl>

<p>The <a href="#ref-XPath">XPath</a> transform output is the result of
applying an
XPath expression to an input string. The XPath expression appears in a
parameter
element named <code>XPath</code>. The input string is equivalent to the
result
of dereferencing the URI attribute of the <code>Reference</code> element
containing the
XPath transform, then, in sequence, applying all transforms that appear
before the XPath
transform in the <code>Reference</code> element's
<code>Transforms</code>.</p>

<p>The primary purpose of this transform is to ensure that only specifically
defined
changes to the input XML document are permitted after the signature is
affixed.
The XPath expression can created such that it includes all elements except
those
meeting specific criteria.  It is the responsibility of the XPath expression
author
to ensure that all necessary information has been included in the output
such that
modification of the excluded information does not affect the interpretation
of the
output in the application context.  One simple example of this is the
omission of an
enveloped signature's <code>SignatureValue</code> element.</p>

<h4>6.6.3.1 Evaluation Context Initialization</h4>

<p>The XPath transform establishes the following evaluation context for the
XPath expression given in the <code>XPath</code> parameter element:</p>

<ul>
<li>A <b>context node</b>, initialized to the input XML document's root
node.</LI>
<li>A <b>context position</b>, initialized to 1.</LI>
<li>A <b>context size</b>, initialized to 1.</LI>
<li>A <b>library of functions</b> equal to the function set defined in <a
href="#ref-XPath">XPath</a>
plus the function <a href="#function-serialize">serialize()</a>.</li>
<li>A set of variable bindings. No means for initializing these is defined.
Thus, the set of
variable bindings used when evaluating the XPath expression is empty, and
use of a variable
reference in the XPath expression results in an error.</li>
<li>The set of namespace declarations in scope for the XPath
expression.</li>
</ul>

<h4>6.6.3.2 Parsing Requirements for XPath Evaluation</h4>

<p>An XML processor is used to read the input XML document and produce a
parse
tree capable of being used as the initial context node for the XPath
evaluation, as described in the previous section.  If the input is not a
well-formed XML document, then the XPath transform must throw an
exception.</p>

<p>Validating and non-validating XML processors only behave in the same way
(e.g. with
respect to attribute value normalization and entity reference definition)
until an external
reference is encountered.  If the XPath transform implementation uses a
non-validating processor,
and it encounters an external reference in the input document, then an
exception must
be thrown to indicate that the necessary algorithm is unavailable (The XPath
transform cannot
simply generate incorrect output since many applications distinguish an
unverifiable
signature from an invalid signature).</p>

<p>As a result of reading the input with an XML processor, linefeeds are
normalized,
attribute values are normalized, CDATA sections are replaced by their
content,
and entity references are recursively replaced by substitution text.  In
addition,
consecutive characters are grouped into a single text node.</p>

<p>The XPath implementation is expected to convert the information in the
input XML
document and the XPath expression string to the character domain prior to
making any
comparisons such that the result of evaluating the expression is equivalent
regardless
of the initial encoding of the input XML document and XPath expression.</p>

<p>Based on the namespace processing rules of XPath, the namespace prefix of
namespace-qualified nodes must be available in the parse tree.</p>

<p>Based on the expression evaluation requirements of the XPath function
library,
the <b>document order</b> position of each node must be available in the
parse tree,
except for the attribute and namespace axes.  The XPath transform imposes no
order
on attribute and namespace nodes during XPath expression evaluation, and
expressions
based on attribute or namespace node position are not interoperable.  The
XPath
transform does define an order for namespace and attribute nodes during
<a href="#function-serialize">serialization</a>.</p>

<h4>6.6.3.3 XPath Transform Functions</h4>

<p>The function library of the XPath transform includes all functions
defined
by the XPath specification plus the serialize() function defined
below.  For most XPath transforms, serialize() need not be called explicitly
since it
is called automatically if the expression result is a node-set.  However,
serialization
must be represented as an XPath function since it requires access to the
internal
representation of a node-set (see parsing requirements).</p>

<p>
<a name="function-serialize"><b>Function: </b><i>string</i>
<b>serialize</b>(<i>node-set</i>)</a>
</p>

<p>This function converts a node-set into a string by generating the
representative text
for each node in the node-set.  The nodes of a node-set are processed in
ascending order
of the nodes' <b>document order</b> positions except for attribute and
namespace nodes,
which do not have document order positions.</p>

<p>The nodes in the attribute and namespace axes will each be processed in
lexicographic order,
with the namespace axis preceding the attribute axis.  Lexicographic
comparison is performed using
namespace URI as the primary key and local name as secondary key (nodes with
no namespace
qualification have an empty namespace URI, which is defined to be
lexicographically least).
Lexicographic comparison is based on the UCS codepoint values, which is
equivalent to lexical
ordering based on UTF-8.</p>

<p>The method of text generation is dependent on the node type and given in
the following list:</p>

<ul>
<li><b>Root Node-</b> Nothing (no byte order mark, no XML declaration, no
document
type declaration).</li>

<li><b>Element Nodes-</b> An open angle bracket (&lt;), the element QName,
the nodes of the
namespace axis, the nodes of the attribute axis, a close angle bracket (>),
the descendant
nodes of the element that are in the node-set (in document order), an open
angle bracket, a
forward slash (/), the element QName, and a close angle bracket.
The element <a href="http://www.w3.org/TR/REC-xml-names/#NT-QName">QName</a>
is either the
local name if the namespace prefix string is empty or the namespace prefix
and a colon,
then the local name of the element.</li>

<li><b>Namespace and Attribute Nodes-</b> a space, the node's QName, an
equals sign,
an open double quote, the modified string value, and a close double quote.
The string value of the node is modified by replacing all ampersands (&amp;)
with <code>&amp;amp;</code>,
and all double quote characters with <code>&amp;quot;</code>, and all
illegal characters for UTF-8
encoding with hexadecimal character references (e.g.
<code>&amp;#x0D;</code>).</li>

<li><b>Text Nodes-</b> the string value, except all ampersands are replaced
by <code>&amp;amp;</code>,
all open angle brackets (&lt;) are replaced by <code>&amp;lt;</code>, and
all illegal characters
for UTF-8 encoding with hexadecimal character references (e.g.
<code>&amp;#x0D;</code>).</li>

<li><b>Processing Instruction Nodes-</b> an open angle bracket, a question
mark, the PI target name
of the node, a space, the string value, the question mark, and a close angle
bracket.</li>

<li><b>Comment Nodes-</b> the open comment sequence (&lt;!--), the string
value of the node, and the close
comment sequence (-->).</li>
</ul>

<h4 name="sec-XPathTransformOutput">6.6.3.4 XPath Transform Output</h4>

<p>The result of the XPath expression is a string, boolean, number, or
node-set.
If the result of the XPath expression is a string, then the string converted
to
UTF-8 is the output of the XPath transform. If the result is a boolean or
number,
then the XPath transform output is computed by calling the XPath string()
function
on the boolean or number then converting to UTF-8.
If the result of the XPath expression is a node-set, then the XPath
transform
result is computed by applying the serialize() function to the node-set,
then
converting the resulting string to UTF-8.</p>

<p>For example, consider creating an enveloped signature S1 (a
<code>Signature</code> element
with an <code>id</code> attribute equal to "S1").  The signature S1 is
enveloped because its
<code>Reference</code> URI indicates some ancestor element of S1. Since the
<code>DigestValue</code>
in the <code>Reference</code> is calculated before S1's
<code>SignatureValue</code>, the
<code>SignatureValue</code> must be omitted from the
<code>DigestValue</code> calculation.
This can be done with an XPath transform containing the following XPath
expression in its
<code>XPath</code> parameter element:</p>

<p> <code>
/descendant-or-self::node()[<br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;not(self::SignatureValue and
parent::Signature[@id="S1"]) and<br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;not(self::KeyInfo and
parent::Signature[@id="S1"]) and<br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;not(self::DigestValue and ancestor::*[3 and
@id="S1"])]
</code> </p>

<p>The '/descendant-or-self::node()' means that all nodes in the entire
parse
tree starting at the root node are candidates for the result node-set.  For
each node candidate,
the node is included in the resultant node-set if and only if the node test
(the boolean expression
in the square brackets) evaluates to "true" for that node.  The node test
returns true for all
nodes except the <code>SignatureValue</code> and <code>KeyInfo</code> child
elements and the
and the <code>DigestValue</code> descendants of <code>Signature</code> S1.
Thus, serialize()
returns a string containing the entire input except for omitting the parts
of S1 that must change
during core processing of S1, so these changes will not invalidate a
<code>DigestValue</code>
computed over the serialize() result.</p>

<p>Note that this expression works even if the XPath transform is
implemented with a non-validating
processor because S1 is identified by comparison to the value of an
attribute named 'id' rather
than by using the XPath id() function.  Although the id() function is useful
when the 'id'
attribute is not named 'id', the XPath expression author will know the 'id'
attribute's name when
writing the expression.</p>

<p>It is RECOMMENDED that the XPath be constructed such that the result of
this operation
is a well-formed XML document. This should be the case if root element of
the input
resource is included by the XPath (even if a number of its descendant nodes
are omitted by the XPath expression). It is also RECOMMENDED that nodes
should not be
omitted from the input if they affect the interpretation of the output nodes
in the
application context.  The XPath expression author is responsible for this
since the
XPath expression author knows the application context.</p>
Attachments

text/html attachment: transforms2.htm
Received on Thursday, 23 March 2000 19:13:50 UTC