Re: RDF API 1.0 Draft / algorithm for anonymous URIs from Sergey Melnik on 1999-12-08 (www-rdf-interest@w3.org from December 1999)

From: Sergey Melnik <melnik@DB.Stanford.EDU>
Date: Wed, 08 Dec 1999 15:24:31 -0800
To: Gabe Beged-Dov <begeddov@jfinity.com>
CC: RDF Interest Group <www-rdf-interest@w3.org>
Message-ID: <384EE8AF.6B653204@db.stanford.edu>

Gabe Beged-Dov wrote:

> I'm thinking that there may be many benefits to making the state of the model
> (open/closed) an explicit part of the processing model. When a model is open,
> you can add triples to it using the low level add/remove calls.  Once it is
> closed you would not be able to modify the model.  Until the model is closed,
> it doesn't have a signature. The act of closing a model is a significant
> event in the model's lifecycle that involves generating the signature for the
> model and potentially for all the noname resources in the model.

First, to avoid cyclic dependency the names of the anonymous resources
should not depend on the content of the model and thus can be set before
the resources get into the model.

Although I intuitively tend to agree with you that there might be
benefits for distinguishing open vs. closed models, I think the
computing the model digest is not such a big problem: the first time the
digest is asked for, you walk through all triples and do the
computation. After that, you only need to maintain the correct digest.
The current algorithm allows it to do it in a very inexpensive way: if a
triple is added or deleted, you just XOR its digest with the digest of
the model. That works because

(A xor B) xor B = A       and
A xor B = B xor A

Furthermore, why is it undesirable to query "open" models? Cannot we
read from files open for writing?

> I assume (with hand-waving) that we can find some graph algorithm for

One important requirement on the algorithm is that it should be suitable
for streaming, i.e. the storage/computation time should be bound by the
depth of the XML tree rather than depend on the number of nodes. With
other words, even if RDF dumps of the Open Directory used anonymous
resources throughout, we must be able to create a triple stream
independent of the size of the dump.

So I've come up with the following algorithm. Let me explain it using an
example. Let X,Y,Z be anonymous resources, A, B, C be explicitly named;
a, b, c, d, e are property names. Assume we have a following
hierarchical structure derived from the corresponding XML subtree:

X --a--> Y
          --b--> Z
                  --c--> A
                  --d--> B
          --e--> C

Then, the digests (and the URI based upon them) for X, Y and Z are
determined as follows (Rn is the digest of resource R rotated by n
bytes, ^=xor):

Z = a0^b1^c2^A3^d2^B3
Y = a0^b1^Z2^e1^C2
X = a0^Y

If say Y becomes an explicit name later on, the URI for Z remains
intact, but X changes. If X gets a name, both Y and Z remain intact.
Furthermore, since XOR is commutative, the nodes can be moved withing
their parent elements (in the XML serialization). Note that rotating the
digest prevents compensating: thus in Y =
a0^b1^(a0^b1^c2^A3^d2^B3)2^e1^C2, a0 and b1 are not eliminated.

Comments?

Sergey

Received on Wednesday, 8 December 1999 18:19:03 UTC