Performance numbers for C14N

Here are some performance numbers that demonstrate

 a) how subtree based canonicalization costs almost same as XML 
serialization
 b) how nodeset based canonicalization is really bad for performance


Consider the following four algorithms

    * Algorithm A : Plain serialization
    * Algorithm B:  The very efficient subtree based C14N
    * Algorithm C: The moderately efficient nodeset based C14N, which
      does not expand out namespace nodes (the one mentioned in
      exclusive C14n spec, that Thomas pointed out)
    * Algorithm D: The extremely inefficient nodeset based C14N which
      expands all namespace nodes. (the one mentioned in inclusive C14N
      spec)


Algorithm A, B and C are available in JDK 1.6, and that is what I have 
used to demonstrate the performance (with permission from Sean Mullan)
The JDK 1.6 tries to use the subtree based code if possible. To make it 
use the subtree based algorithm, I just assign an ID to the subtree, and 
then create a Reference to this ID. But to make it use the nodeset based 
algorithm, I use the same reference to that ID, but then I add a Xpath 
Filter transform with an expression 1=1. This expression always 
evaluates to true, so this is exactly same as signing the subtree.


We will run these algorithms on these three xml files.

    * *5k_few_nodes.xml*:  This is a file with very few nodes. There is
      just one very large text node
    * *5k_many_nodes.xml*:  This is a file with many nodes, each node is
      very small
    * *5k_many_nodes_namespaces.xml*: This is a file with many nodes, it
      also has many namespace nodes


Here are the numbers on my machine


	Algorithm A
(serialize)
	Algorithm B
(subtree c14n)
	Algorithm C
(nodeset c14n)
	Algorithm D
(original)
5k_few_nodes.xml
	3.0ms
	4.0ms
	6.7ms
	
5k_many_nodes.xml
	4.1ms
	4.3ms
	21.5ms
	
5k_many_nodes_namespaces.xml
	5.0ms
	5.4ms
	164ms
	


I need to dig up an implementation for Algorithm D.
Also I will check in these tests into a location in the CVS

Pratik

Received on Monday, 4 May 2009 20:34:31 UTC