Re: RDF API 1.0 Draft / signing RDF content from Sergey Melnik on 1999-12-06 (www-rdf-interest@w3.org from December 1999)

From: Sergey Melnik <melnik@DB.Stanford.EDU>
Date: Mon, 06 Dec 1999 14:31:39 -0800
To: Gabe Beged-Dov <begeddov@jfinity.com>
CC: RDF Interest Group <www-rdf-interest@w3.org>
Message-ID: <384C394B.4D69BBC6@db.stanford.edu>
Gabe Beged-Dov wrote:
> 
> In general, it looks really great. The use of a signature as the URI is
> very powerful :-! I have several  short questions/comments about this.

In fact, the original motivation behind signatures was that one needed a
way to generate interoperable hashCode()'s that are independant of the
implementation. Content-based URIs for (reified) triples have already
been discussed on the list. I refined the algorithm so that

subj --pred--> obj
subj --pred--> "obj"

and all possible permutations like pred --subj--> obj etc. yield
different digests, i.e. different URIs.

> The first question is related to the interaction between the model being
> "closed" and the generation of the signature. Should the concept of the
> model being open vs. closed be part of the API?

Could you reformulate this question? If this is what you are asking
about: the model URI is recomputed whenever triples are added or removed
from the model. ...Uups, just found a bug, this is currently not being
done in the code...

> The second comment  is related to noname resources. The sample
> implementation uses the incrementing genid which is dependant on the XML
> serialization. I was wondering if there couldn't be a step at model
> signature generation time that also generated the signatures for the
> nonames. The idea would be that once the model is stable you can
> generate a signature for the noname using something like the set of
> triples for which the noname is the subject using the same algorithm as
> for models (kind of like mini-models [forgive the Austin Powers
> reference, I saw a scary mini-me doll shopping last night and it stuck
> in my mind]:)

That's another tough issue, you are absolutely right. First, on the
model level there no "proper" noname resources, since every resource
must have a URI.
org.w3c.rdf.util.RDFUtil has a static method noname() that generates a
cryptographically strong unique identifier for a noname resource.
On the syntax level, the problem becomes how to make sure that *every*
compliant RDF parser generates the same URI for a given noname resource.
Your idea to somehow bind noname URI generation to the content is very
tempting, great idea! The same algorithm as for models would not work,
though, since the noname URI would be recursively dependant on the
"mini-model" URI.
In the current RDF syntax, a noname resource can be at most once an
object of a statement and can have a bunch of properties. This
information fully determines the "context" of a noname resource. Thus,
the URI for 

<rdf:Description>
   <fn>John</fn>
   <ln>Smith</ln>
</rdf:Description>

could be computed using the data

--fn--> "John"
--ln--> "Smith"

There there at least three further issues to consider:

(1) duplicates: if I repeat the same RDF/XML content like the
description above, I'd prefer to fuse both of them. Can that be a
problem from a semantic point of view?

(2) recursion: in case of nested RDF descriptions, we have to postpone
triple generation until the descendand nodes are fully processed.
Furthermore, if we use the fact that the noname resource is an object of
someone else's statement, mutual dependency becomes ugly again.

(3) changes: the generated URIs will only make sense, if the "context"
of the URI remains intact. As soon as I add another property to the
resource, or modify property's value, the URI breaks.

On the other hand, we can move the descriptions around in the document.
I think this content-based approach works better than an XPointer-based
one.

Noname URI generation is a syntax related issue. However, it will arise
no matter what kind of XML-based syntax we take. So what do you think
about (1)-(3)?

Sergey
Received on Monday, 6 December 1999 17:26:20 UTC