Re: Model Digest Algorithm

"McBride, Brian" wrote:
> 
> The idea of a signature for a model is an excellent one, but I wonder if the
> algorithm needs more work.
>
> [...]
>
> The two Rx(subject_digest) cancel out and the resulting signature is
> independent of the
> subject in the model.
> 
> Brian McBride
> HPLabs

You are absolutely right. Sometimes coding happens faster than thinking
;) The algorithm was merely indented to illustrate the approach. If
digests are to be used seriously, a standard is needed. For now, to get
rid of this ugly bug I updated the algorithm for computing statement
digests as follows:

For subject s, predicate p, object o and digest function digest()
(currently SHA-1):

  d1 = digest(s)
  d2 = digest(p)
  d3 = digest(o)

  if(o instanceof Literal)
    circular rotate left d3 by 8 bits

  statement_digest = digest( concat(d1, d2, d3) )

In the final step, the digest algorithm is applied to the sequence of
bytes obtained by concatenation of d1, d2 and d3. This is slower, but
fairly waterproof.

As to efficiency: given a very large persistently stored repository of
RDF models, the digest computation can be implemented as a background
process. After that, for any given set of statements, the model digest
can be computed very fast (especially if you have a stored procedure to
do this). 

Currently it is a XOR over all statement digests having advantage that
model digest can be easily recomputed dynamically when statements are
added/deleted. Are you aware of other symmetrical algorithms that are
more secure and have the same nice property? If security of the current
approach is insufficient, this property can be dropped.

I'm putting together another bunch of updates and will release another
version of the API soon.

Sergey

Received on Saturday, 29 April 2000 02:04:34 UTC