W3C home > Mailing lists > Public > www-rdf-interest@w3.org > May 2000

Security of Digest Algorithm

From: McBride, Brian <bwm@hplb.hpl.hp.com>
Date: Wed, 10 May 2000 15:32:53 +0100
Message-ID: <5E13A1874524D411A876006008CD059F239227@0-mail-1.hpl.hp.com>
To: "RDF Interest (E-mail)" <www-rdf-interest@w3.org>
I said recently I'd try to get some help on the security of the proposed
model digest algorithm.

I'm indebted to one of my colleagues in hut 5 who was the first, of several
that I asked,  to point out a solution to this question. 

The bottom line is it's not secure. For a digest of k bits, given any set S
of k+1 statements, it is simple to compute a non-empty subset of S whose 
digests will XOR to zero.  This would enable them to be added to a model 
without changing its digest.

My mathematical colleague says:

	Suppose we are given a set of n binary vectors of length k, where n
> k.
	Then there are least n-k non-empty subsets of the vectors whose
	XOR is 0, and they can be found efficiently using a modification of 
	Gaussian elimination as described on p.425 of [*]. In particular,
	any set of 161 binary vectors of length 160 this algorithm will find
	non-empty subset whose combined XOR is 0.

	[*] D.E. Knuth, Seminumerical Algorithms, The Art of Computer
	Vol 2, Addison Wesley, Reading, MA, 2nd ed. 1981.

Here is an informal explanation of why.  We'll do this for 160 bit digests,
since SHA-1 generates digests of that size.  If you have 161 statements, you
can think of this as a set of 160 linear equations in 161 variables.
Because there are more variables than equations, there has to be at least
one non-trivial solution.  That solution can be found by solving the set 
of the equations by a variation on the technique we all learned in high 
school.  Here's how to construct the equations:

Pick a random set of 161 statements S = {s0, s1, ... s160} and compute the
set of their digests D = {d0, d1, ... d160}.  We want to find a subset of S
whose digests XOR to 0 i.e. we want to find bits x0, x1, ... x160 such that:

d0&x0 ^ d1&x1 ^ d2&x2 ... ^ d160&x160 = 0

i.e. xi = 1 if s1 is in the subset and 0 otherwise.  Thinking just about the
bit 0 column of each of the di, then each bit, anded with the appropriate
xi, must XOR to zero.  More formally, with d0:0 is the 0 bit of d0 etc,

d0:0&x0 ^ d1:0&x1 ^ d2:0&x2 ... ^ d160:0&x160 = 0

The same is true of the bit 1 column, the bit 2 column etc up to the last
column, so we have a system of simultanious equations in the xi:

d0:0&x0 ^ d1:0&x1 ^ d2:0&x2 ... ^ d160:0&x160 = 0
d0:1&x0 ^ d1:1&x1 ^ d2:1&x2 ... ^ d160:1&x160 = 0
d0:159&x0 ^ d1:159&x1 ... ^ d160:159&x160 = 0


Brian McBride (with a lot of help)
Received on Wednesday, 10 May 2000 10:33:03 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:07:29 UTC