- From: Pratik Datta <pratik.datta@oracle.com>
- Date: Mon, 16 Mar 2009 18:38:42 -0700
- To: XMLSec WG Public List <public-xmlsec@w3.org>
- Message-ID: <49BEFF22.5070305@oracle.com>
The streaming requirement is not captured very well in the transform
note document. So let me explain it here, this also answers some of
Chris Solc's comments from last meeting.
The current canonicalization is defined in terms of a nodeset. In Java,
the function signature could be like this
byte[] doCanonicalize(Set<Node>)
i.e. it takes an unordered set of nodes, canonicalizes them, and
produces an array of bytes. In the Nodeset, the nodes are not ordered in
any way and also the nodest requires a backing DOM to know parent/child
relationships between the Nodes. This is what makes Nodesets inherently
unstreamable.
With streaming canonicalization, the function signature could be like this
InputStream setupCanonicalizer(XMLStreamReader)
The input is StAX XML Stream reader. StAX is a streaming XML Parser - it
represents a document as a set of "Events" e.g. startElement, text,
endElement etc. Attributes and namespaces are returned in the
startElement event. The StaX event stream is ordered and doesn't need a
backing DOM. That is why I want to use a mechanism similar to this to
represent the input to the canonicalizer.
The output in an InputStream, this is Java's way of representing a
stream of bytes.
Note, this function will not actually canonicalize, it will just sets it
up. Actual canonicalization will happen when somebody reads from the
returned InputStream. As and when somebody reads from the InputStream,
the canonicalizer will read from the XMLStream reader. I.e. even if this
function is asked to canonicalize a 1MB document, it will not allocate a
1MB array in memory, it will just require a small fixed size buffer
internally. (assuming there is cap on the size of a single element tag)
These java functions were just to illustrate the streaming requirements,
which are
* Input to canonicalizer is something that can be representable as
XML event stream
* Output of the canonicalizer is a byte stream
* canonicalizer should be able to do chunking, it should not be
required to keep the entire input document in memory
* The input to the canonicalizer should not have data that cannot be
represented by an XML Stream, e.g. attributes without their owner
elements cannot be represented.
Pratik
Received on Tuesday, 17 March 2009 01:39:22 UTC