- From: Sergey Melnik <melnik@DB.Stanford.EDU>
- Date: Wed, 08 Dec 1999 15:24:31 -0800
- To: Gabe Beged-Dov <begeddov@jfinity.com>
- CC: RDF Interest Group <www-rdf-interest@w3.org>
Gabe Beged-Dov wrote: > I'm thinking that there may be many benefits to making the state of the model > (open/closed) an explicit part of the processing model. When a model is open, > you can add triples to it using the low level add/remove calls. Once it is > closed you would not be able to modify the model. Until the model is closed, > it doesn't have a signature. The act of closing a model is a significant > event in the model's lifecycle that involves generating the signature for the > model and potentially for all the noname resources in the model. First, to avoid cyclic dependency the names of the anonymous resources should not depend on the content of the model and thus can be set before the resources get into the model. Although I intuitively tend to agree with you that there might be benefits for distinguishing open vs. closed models, I think the computing the model digest is not such a big problem: the first time the digest is asked for, you walk through all triples and do the computation. After that, you only need to maintain the correct digest. The current algorithm allows it to do it in a very inexpensive way: if a triple is added or deleted, you just XOR its digest with the digest of the model. That works because (A xor B) xor B = A and A xor B = B xor A Furthermore, why is it undesirable to query "open" models? Cannot we read from files open for writing? > I assume (with hand-waving) that we can find some graph algorithm for One important requirement on the algorithm is that it should be suitable for streaming, i.e. the storage/computation time should be bound by the depth of the XML tree rather than depend on the number of nodes. With other words, even if RDF dumps of the Open Directory used anonymous resources throughout, we must be able to create a triple stream independent of the size of the dump. So I've come up with the following algorithm. Let me explain it using an example. Let X,Y,Z be anonymous resources, A, B, C be explicitly named; a, b, c, d, e are property names. Assume we have a following hierarchical structure derived from the corresponding XML subtree: X --a--> Y --b--> Z --c--> A --d--> B --e--> C Then, the digests (and the URI based upon them) for X, Y and Z are determined as follows (Rn is the digest of resource R rotated by n bytes, ^=xor): Z = a0^b1^c2^A3^d2^B3 Y = a0^b1^Z2^e1^C2 X = a0^Y If say Y becomes an explicit name later on, the URI for Z remains intact, but X changes. If X gets a name, both Y and Z remain intact. Furthermore, since XOR is commutative, the nodes can be moved withing their parent elements (in the XML serialization). Note that rotating the digest prevents compensating: thus in Y = a0^b1^(a0^b1^c2^A3^d2^B3)2^e1^C2, a0 and b1 are not eliminated. Comments? Sergey
Received on Wednesday, 8 December 1999 18:19:03 UTC