Re: Cryptographic digests of RDF models from Sergey Melnik on 2000-11-06 (www-rdf-interest@w3.org from November 2000)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Mon, 06 Nov 2000 15:52:07 -0800
To: Steve Dunham <dunham@cse.msu.edu>
CC: www-rdf-interest@w3.org
Message-ID: <3A074427.C1515931@db.stanford.edu>

Steve,

thanks for your suggestion. I'll most probably include it into the next
distribution of the API. One of the desirable properties that I wanted a
model digest algorithm to have is easy recomputation of the digest
whenever the content of the model changes. This property is not
satisfied when using your suggestion, but it is not absolutely
essential.

I'm thinking of whether the digest algorithm can be made more efficient.
Currently, statement digests are computed as 

  d1 = SHA1(s)
  d2 = SHA1(p)
  d3 = SHA1(o)

  if(o instanceof Literal)
    rotate left d3 by 8 bits

  statement_digest = SHA1( concat(d1, d2, d3) )

That is, in worst case, computation of a model digest involves 4
applications of SHA1 to every statement (2-3 on average), which is
expensive. Maybe one SHA1 call per model is sufficient. One could
concatenate resource URIs/literals in some robust way... Any thoughts on
that?

Sergey

Steve Dunham wrote:
> 
> I was reading your page on rdf digests[1], which says that you're
> using a XOR of statement digests as a model digest, and that it isn't
> secure. (For fairly obvious reasons.)  And it says the digest is still
> under construction.
> 
> For what it's worth, one way to do a reasonably secure hash is to take
> a SHA1 of the concatenation of a sorted list of the statement hashes.
> 
>    That's:   SHA1( concat( sort( statment_hashes )))
> 
> It's the first thing that comes to mind.  I'm sure there are other
> solutions.  (I'm assuming that the only constraint is that the hash
> is independent of statement order.)
> 
> Steve
> dunham@cse.msu.edu
> (CC responses to me)
> 
> [1] http://www-db.stanford.edu/~melnik/rdf/api.html

Received on Monday, 6 November 2000 18:34:44 UTC