Re: CC/PP, RDF and trust issues from Sergey Melnik on 2000-04-30 (www-rdf-interest@w3.org from April 2000)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Sun, 30 Apr 2000 12:37:08 -0700
To: Graham Klyne <GK@Dial.pipex.com>
CC: www-rdf-interest@w3.org, CC/PP WG list <w3c-ccpp-wg@w3.org>
Message-ID: <390C8B64.D8B0518E@db.stanford.edu>
Graham,

> I took a look at the first reference you supplied
> (http://nestroy.wi-inf.uni-essen.de/rdf/sum_rdf_api/), and did not see
> anything there about signing RDF graphs.  The nearest I could find was the
> discussion in section 3.1 about using digests for resource ID
> generation.  Could you direct me more closely?

Please see examples below.

> Generally speaking, I see we concur about the futility of carrying
> signatures of RDF serialization syntax as a way of representing assurances
> in an RDF model.  But we also seem to differ in the need to _sign_ an RDF
> model (as distinct from representing the assurance a signature may convey).
>
> OF SIGNATURES AND ASSURANCES
> 
> The view I had taken was that an RDF model, or graph, does not have any
> specific representation.  Thus, it is not necessary to define a form of
> unforgeable _signature_ over an RDF graph, but rather to have a way of
> representing within the graph the type and level of assurance that is
> associated with various sub-graphs.  Mechanisms for resisting forgery
> belong in the serialization layer, not in (what I might call) the semantic
> layer.

I agree that signing RDF statements/models is a promising way of
building the Web of Trust. However, I'm skeptical of using serialization
syntaxes for that. As I pointed out earlier, I believe we have to sign
*content* rather than one of its syntactic representations.

> For example:  suppose I receive a paper signed by "Alice" stating that "Bob
> has supplied service X".  Alice's signature on the paper allows me to
> assert, as a fact:  "Alice states that Bob has supplied service X".  This
> assertion is a meaning, without physical representation, that I draw from
> the paper supplied and as such is not of itself subject to confirmation of
> protection by Alice's signature.  (If I lose the original paper, I may
> still believe the conclusion drawn from it.)
> 
> In our discussions about CC/PP, one scenario we have considered is receipt
> of an RDF expression (in any serialization syntax) on a channel secured
> using SSL.   By virtue of using SSL (into an appropriately secure
> environment), the recipient can have confidence about the authenticity and
> integrity of the received RDF graph, but there is no signature that can be
> used for post-hoc verification of the information.  I believe this
> underlines the idea that an RDF graph (received in any form) should be able
> to convey information about assurances provided by signatures, without
> involvement in the signature mechanisms used for transferring such assurances.

Let me sketch my understanding of trust in RDF using the following
simplified example based on the scenario you described above.

Imagine you got the following three statements (over an insecure link):

T --rdf:type-->  SignedStatements
T --principal--> Alice
T --algorithm--> RSA
T --content-->   BLOB

The recipient fetches Alice's public RSA key and decrypts the content of
the (small) BLOB:

B --rdf:type-->      Model
B --location-->      URL
B --secureHash-->    09efe...a6e43
B --hashAlgorithm--> SHA-1

Then, the recipient downloads RDF content (over an insecure link, in
some serialization syntax) from the given URL, parses it, and computes
SHA-1 hash of the model (rather than of its syntactic representation).
If the hashes match, the recipient can assume that the statements
contained within the model were made by Alice. One of these statements
(among hundreds of others) is "Bob has supplied service X".

Many variations of the above schema are conceivable. Given some
communication context, the "location" of Alice's statements may be the
next message in sequence rather than a static URL, the participants may
want to negotiate the public key and hashing algorithms etc.

> USING REIFICATION
> 
> The RDF model provides the basis for a mechanism as described above through
> reification.  But, returning to a theme of my original message, reification
> is like an "assembly language" concept:  it can do the job but can be
> difficult and clumsy to use.  I was seeking a "high level language" form
> for expressing such ideas in RDF that can (in principle) be translated into
> reification, but which can also be handled by "direct interpretation"
> within an application.

Using cryptographic hashes based on a standard algorithm simplifies
handling of reification a bit. Consider another variation of the above
example:

The recipient gets:

T --rdf:type-->  SignedStatements
T --principal--> Alice
T --algorithm--> RSA
T --statement--> <hash1>
...
T --statement--> <hashN>
<statement 1>
...
<statement N>

Given that there exist an algorithm for computing hashes of statements,
the recipient can iterate through the statements in the message and
match their hashes against those explicitly listed. In the same loop,
the hash of the model is computed.

Then, the recipient fetches Alice's public RSA key and decrypts the
content of the (small) BLOB:

B --rdf:type-->      Model
B --secureHash-->    09efe...a6e43
B --hashAlgorithm--> SHA-1

The remaining step is to compare the model hash with the one computed in
the loop.

> USING DIGESTS
> 
> I think you are suggesting that a digest of an RDF subgraph can be used to
> construct an identifier that stands for the reification of that
> subgraph.  Then, using that identifier, RDF statements can be made about
> the reified subgraph.
> 
> For example, consider the CC/PP profile of some client C:  let the subgraph
> be represented by:
> 
>    ([C]...)
> 
> Let some serialized form of the digest of this subgraph be represented by:
> 
>    Digest([C]....)
> 
> Then the identifer:
> 
>    reify:Digest([C]....)
> 
> may be defined to stand for the RDF subgraph containing reifications of all
> of the RDF statements in ([C]...), and we can make statements like:
> 
>    reify:Digest([C]....)--assuredBy-->"Alice"
> 
> Is this roughly what you are suggesting?  If so, I have one concern, which
> I think is probably quite solvable.  Given an RDF graph containing:
> 
>    ([C]...)
>    ([D]...)
>    ([E]...)
>    reify:Digest([C]....)--assuredBy-->"Alice"
> 
> How is a program to determine exactly which RDF statements are assured by
> Alice?  (Short of trying all possible combinations to find one that yields
> the correct digest?)
> 
> I did think of some ideas involving additional tagging in the RDF graph,
> but they all seem to have problems when additional statements about the
> same subjects get added to a graph.  Any thoughts?

Above I sketched two possible ways of handling this difficulty. The
general problem here is to embed one RDF model into another. While
reification makes this possible, the current way of doing that yields
500% overhead (need 4 reification statements + a link from the model to
the reified statement). In the last example, embedding requires only one
additional statement (still 100% overhead). Furthermore, I don't exclude
the possibility that there will be a syntactic shortcut (another syntax
for RDF) that provides a compact way of specifying embedded models
(similar to the hierarchical approach of XML).

Sergey
Received on Sunday, 30 April 2000 15:26:25 UTC