- From: Pratik Datta <pratik.datta@oracle.com>
- Date: Mon, 16 Mar 2009 18:38:42 -0700
- To: XMLSec WG Public List <public-xmlsec@w3.org>
- Message-ID: <49BEFF22.5070305@oracle.com>
The streaming requirement is not captured very well in the transform note document. So let me explain it here, this also answers some of Chris Solc's comments from last meeting. The current canonicalization is defined in terms of a nodeset. In Java, the function signature could be like this byte[] doCanonicalize(Set<Node>) i.e. it takes an unordered set of nodes, canonicalizes them, and produces an array of bytes. In the Nodeset, the nodes are not ordered in any way and also the nodest requires a backing DOM to know parent/child relationships between the Nodes. This is what makes Nodesets inherently unstreamable. With streaming canonicalization, the function signature could be like this InputStream setupCanonicalizer(XMLStreamReader) The input is StAX XML Stream reader. StAX is a streaming XML Parser - it represents a document as a set of "Events" e.g. startElement, text, endElement etc. Attributes and namespaces are returned in the startElement event. The StaX event stream is ordered and doesn't need a backing DOM. That is why I want to use a mechanism similar to this to represent the input to the canonicalizer. The output in an InputStream, this is Java's way of representing a stream of bytes. Note, this function will not actually canonicalize, it will just sets it up. Actual canonicalization will happen when somebody reads from the returned InputStream. As and when somebody reads from the InputStream, the canonicalizer will read from the XMLStream reader. I.e. even if this function is asked to canonicalize a 1MB document, it will not allocate a 1MB array in memory, it will just require a small fixed size buffer internally. (assuming there is cap on the size of a single element tag) These java functions were just to illustrate the streaming requirements, which are * Input to canonicalizer is something that can be representable as XML event stream * Output of the canonicalizer is a byte stream * canonicalizer should be able to do chunking, it should not be required to keep the entire input document in memory * The input to the canonicalizer should not have data that cannot be represented by an XML Stream, e.g. attributes without their owner elements cannot be represented. Pratik
Received on Tuesday, 17 March 2009 01:39:22 UTC