Re: CC/PP, RDF and trust issues from Graham Klyne on 2000-05-02 (www-rdf-interest@w3.org from May 2000)

From: Graham Klyne <GK@Dial.pipex.com>
Date: Tue, 02 May 2000 10:57:12 +0100
To: Sergey Melnik <melnik@db.stanford.edu>
Cc: www-rdf-interest@w3.org, CC/PP WG list <w3c-ccpp-wg@w3.org>
Message-Id: <4.3.1.2.20000502094937.00c2f5d0@pop.dial.pipex.com>
Sergey,

Thank you again for your comments.  I think I now understand what you are 
suggesting.  At this stage, I think the discussion has raised two distinct 
issues, which I'd like to take separately:

(1) what is actually protected (model or syntax)?

(2) how are assurances to be represented?



(1) what is actually protected (model or syntax)?
-------------------------------------------------

At 12:37 PM 4/30/00 -0700, Sergey Melnik wrote:
>I agree that signing RDF statements/models is a promising way of
>building the Web of Trust. However, I'm skeptical of using serialization
>syntaxes for that. As I pointed out earlier, I believe we have to sign
>*content* rather than one of its syntactic representations.

Although we agree on some points, I think we have very different views of 
this issue.  (It probably comes down to what we mean by "content".)

In my view, a signature can be applied _only_ to some syntactic 
representation of an RDF statement.  A signature is, by its very nature, 
calculated over some specific sequence of bits.  These bits _represent_ the 
content, but are not the content itself.  The content is an abstraction 
that is independent of any particular representation or sequence of bits.

(An English banknote carries the statement "I promise to pay the bearer on 
demand...", with a rendering of the Bank of England's chief cashier's 
signature:  the paper is not the money, merely a promise to pay the 
money.  It is not the money or the promise itself that is signed, but a 
written representation of the promise.  In practice, people may treat these 
scraps of paper as if they were actual money (and a similar evolution could 
happen with RDF), but I think it's important to maintain at some level the 
distinction.  For example, the Bank of England's promise remains good long 
after the form of note itself has gone out of circulation and is hence no 
longer regarded as "money".)

[[If the promise were in a different language, and similarly signed by 
BofE's cashier, it would still be a promise to pay the same money.  I'm 
reminded of TimBL's comments about "interpretation properties".]]

I see the process you describe as effectively defining a "canonical" 
representation of an RDF model that is used as a basis for calculating 
digests.  As such, the process is fine, but I'm not yet convinced that it 
is really needed.  I think that the assurance (like the BofE's cashier's 
promise) is something that can exist separately from any 
representation.  The representation is merely a way to prove to someone 
else that the assurance was indeed given.

So, when you say:
>Imagine you got the following three statements (over an insecure link):
>[...]

You talk about receiving an RDF model as if it can be transferred 
independently of any serialization syntax.  But I say that in order to 
communicate the model it must be serialized.  And it is the serialized form 
(rather than the abstract model) that is subject to tampering and other 
corruptions that a signature protects against.  Once we are dealing with an 
abstract model, an assurance either exists or does not exist.



(2) how are assurances to be represented?
-----------------------------------------

At first pass, I rather like your suggestion:

>The recipient gets:
>
>T --rdf:type-->  SignedStatements
>T --principal--> Alice
>T --algorithm--> RSA
>T --statement--> <hash1>
>...
>T --statement--> <hashN>
><statement 1>
>...
><statement N>

In which I take the hash to be a "stand in" for the reified 
statement.  This seems to have a strong link to the underlying theory of 
reification, while also maintaining a reasonably intuitive representation 
for the assurance.

I note that the fundamental difference from the RDFM&S approach to such 
statements is that it adopts an "active voice" approach of the form "Alice 
assures <statements>", rather than a "passive voice" approach of the form 
"<statements> are assured by Alice".  That is, the RDF graph arcs are from 
assurance to statement, rather than statement to assurance, hence provides 
for a useful grouping of statements covered by the assurance.

This in turn requires that the reification of the statements be 
identifiable, hence your introduction of the hashes.  It occurs to me that 
if properties in an RDF graph could be tagged with unique identifiers then 
the uncertainties of hashing and birthday paradox effects could be 
avoided.  (I can conceive the possibility of a semantic web containing 
sufficiently large numbers of of RDF statements that the hash may not offer 
sufficient assurance of uniqueness.  Appendix A of 
<draft-ietf-conneg-feature-hash-04.txt> summarizes the results of an 
analysis of this issue for 128-bit MF5 hashes.)

While I like this approach, I think implications of the differences from 
RDF M&S should be carefully considered.

#g
--


------------
Graham Klyne
(GK@ACM.ORG)
Received on Tuesday, 2 May 2000 07:38:49 UTC