Re: Decryption Transform processing question from merlin on 2002-05-01 (xml-encryption@w3.org from May 2002)

From: merlin <merlin@baltimore.ie>
Date: Wed, 01 May 2002 23:45:10 +0100
To: Ari Kermaier <arik@phaos.com>
Cc: reagle@w3.org, "Takeshi Imamura" <IMAMU@jp.ibm.com>, "Hiroshi Maruyama" <MARUYAMA@jp.ibm.com>, xml-encryption@w3.org
Message-Id: <20020501224510.2F8EE44E1E@yog-sothoth.ie.baltimore.com>
Apologies for that last failure to send.

One of the reasons for the strange description of the decrypt
transform is the assumption stated in 2.1.1:

  (In decryptXML(), all of the steps except the actual decryption are
  necessary because XPath does not permit one to remove and then replace a
  node. Consequently, we must serialize (1), wrap (2), reparse (4), and
  trim the node set (5).)

I'm wondering what is the origin of this statement? As far as I
can see, XPath doesn't say anything of this nature. Where does
it say that I cannot replace some nodes in a node set with others,
specifying where in the document structure these new nodes are
placed? The fact is, for any given implementation I almost
certainly can. An XPath node set is a set of nodes and an
associated tree model; the spec doesn't say anything about not
manipulating these.

Based on the assumption above, I believe that the current decrypt
transform is trying to suggest that operation is as follows:

<Bar xmlns:baz="http://example.org/baz">
  <Foo Id="foo">
    <enc:EncryptedData ...>...</enc:EncryptedData>
  </Foo>
</Bar>

Consider a signature over "#foo" with the decrypt transform.
Under the current spec, I believe that I will end up serializing
and parsing the following to produce the output node set:

<?xml version="1.0"?>
<Foo Id="foo">
    <dummy xmlns:baz="http://example.org/baz"><plaintext /></dummy>
  </Foo>

My output node set is the result of parsing this and removing
the root node, the dummy element and its namespace attributes.
That means that in the output node set, plaintext's parent is
not in the node set, but its grandparent is.

This seems pretty funky to me. Also, the wart in canonical XML
that Gregor's recent examples showed brings up a problem:

<Bar xmlns:baz="http://example.org/baz">
  <Foo Id="foo" xml:something="other">
    <enc:EncryptedData ...>...</enc:EncryptedData>
  </Foo>
</Bar>

Run this through the decrypt transform...

<?xml version="1.0"?>
<Foo Id="foo" xml:something="other">
    <dummy xmlns:baz="http://example.org/baz"><plaintext /></dummy>
  </Foo>

Now if we remove the dummy element and canonicalize this, we
get the following:

<Foo Id="foo" xml:something="other">
    <plaintext xml:something="other"/>
  </Foo>

The xml:* attributes are emitted in any element whose parent
is not in the node set, which will include all dummy-wrapped
elements. So, if xml:* attributes are present in a document,
this transform will break.

Now, if that initial assumption is wrong, and we can manipulate
XPath node sets in both content and structure, then we can have
text along the following lines:

----

 Z = decryptIncludedNodes(X, R)

    where X is a node-set and R is a set of dcrpt:Except elements
specified as a parameter of the transform. Z is a node-set or octet
sequence obtained by the following steps:

       1. Within X, select e, an element node with the type
xenc:EncryptedData, such that it is not referenced by any dcrpt:Except
elements in R. If such e cannot be selected, the algorithm terminates
and Z, the result of the transformation, is X.
       2. If the value of the Type attribute of e is &xenc;Element or
&xenc;Content:
             1. Let Y be decryptXML(X, e). If this function succeeds,
replace X with Y. Otherwise, the implementation MAY signal a failure of
the transform. Alternatively, it MAY also continue processing without
changing X (although it should take an appropriate means to avoid an
infinite loop).
       3. If the Type attribute is absent or otherwise indicates octets:
             1. Let Y' be decryptOctets(X, e). If this function
succeeds, the algorithm terminates and Z, the result of the
transformation, is Y'. Otherwise, the implementation MAY signal a
failure of the transform. Alternatively, it MAY also continue processing
without changing X (although it should take an appropriate means to
avoid an infinite loop).
       4. Go to Step 1.

Y = decryptXML(X, e)
where X is a node-set, e is an element node with the type
xenc:EncryptedData in X. Y is a
node-set obtained by the following steps:

   1. Let C be the parsing context of e. This is the set of all
namespace definitions and general entities in scope for e, not
including those defined by e.
   2. Decrypt the element corresponding to e according to the XML
Encryption specification [XML-Encryption].
   3. Wrap the decrypted octet stream in the context C as specified
in Text Wrapping (Appendix A).
   4. Parse the wrapped octet stream as described in The Reference
Processing Model (section 4.3.3.2) of the XML Signature specification
[XML-Signature], resulting in a node-set.
   5. Let Z be the result of removing the root node, the wrapping
element node, and its associated set of attribute and namespace nodes
from the node-set obtained in Step 4.
   6. Return Y, the node-set obtained by removing the element e and all
its descendants from X, and inserting in their place Z, the decrypted
node set. This involves changes to both the content of the node-set and
the structure of the underlying data model: the subtree of X rooted
at e is replaced with the subtrees of Z rooted at the children of the
dummy element.

----

Under this model, with this input:

<Bar xmlns:baz="http://example.org/baz">
  <Foo Id="foo" xmlns:oof="http://example.org/oof">
    <enc:EncryptedData ...>...</enc:EncryptedData>
  </Foo>
</Bar>

Considering a signature over "#foo" with the decrypt transform,
we will parse the following:

<?xml version="1.0"?>
<dummy xmlns:baz="http://example.org/baz"
  xmlns:oof="http://example.org/oof"><plaintext /></dummy>

We'll then take the plaintext subtree and stick it in the
input node set (or a copy thereof) replacing the enc:EncryptedData
subtree. There'll be no discontinuity, so no c14n problems.

Merlin


-----------------------------------------------------------------------------
The information contained in this message is confidential and is intended
for the addressee(s) only.  If you have received this message in error or
there are any problems please notify the originator immediately.  The 
unauthorised use, disclosure, copying or alteration of this message is 
strictly forbidden. Baltimore Technologies plc will not be liable for
direct, special, indirect or consequential damages arising from alteration
of the contents of this message by a third party or as a result of any 
virus being passed on.

This footnote confirms that this email message has been swept for Content
Security threats, including computer viruses.
http://www.baltimore.com
Received on Wednesday, 1 May 2002 18:45:28 UTC