Representing trust (and other context) in RDF

A STRAWMAN PROPOSAL


1. The problem
--------------

It is easy to attach trust and other context to a document containing 
serialized RDF:  simply sign the document, or attach context information to 
the file or resource.

But when the RDF model data is abstracted from its serialization, all we 
have is a set of statements (and sets of the component parts of statements) 
[RDFM&S, section 5].  Any information about containment of the serialized 
form is lost, and there is no defined means to identify any given statement 
(other than by its entire value).

Manipulation and analysis of RDF model data are most usefully performed on 
the abstract graph (I assert).  But to include trust and other contextual 
information in any such manipulations, they too must be represented in the 
graph.  The currently defined mechanism to do this is reification -- 
representation of each statement (property arc) in a graph by a new 
resource itself having four new property arcs -- incurring a 500% overhead 
in a simple-minded implementation.  A second problem with using reification 
is that it is not an obvious technique for implementers:  this in turn is 
likely to lead to incorrect implementations, or even non-adoption of the 
framework.

For many applications, including CC/PP, capturing contextual information 
does seem to be very important.

One way to address this problem might be to extend the RDF model; e.g. to 
define a new subset of resources called "Contexts", and to define each RDF 
statement to be a quadruple consisting of:

     {Predicate, Subject, Object, Context}

I understand that some implementations of RDF do this.  But, for the time 
being, the official RDF model consists of triples.


2. An approach to representing context in the RDF model
-------------------------------------------------------

Sergey Melnik has suggested 
[http://lists.w3.org/Archives/Public/www-rdf-interest/2000Apr/0113.html] an 
approach of constructing a digest of each statement or subgraph, and having 
a context resource refer to these digest values in lieu of the reified 
statement(s):

>The recipient gets:
>
>T --rdf:type-->  SignedStatements
>T --principal--> Alice
>T --algorithm--> RSA
>T --statement--> <hash1>
>...
>T --statement--> <hashN>
><statement 1>
>...
><statement N>
>
>Given that there exist an algorithm for computing hashes of statements,
>the recipient can iterate through the statements in the message and
>match their hashes against those explicitly listed. In the same loop,
>the hash of the model is computed.

One problem with this is that neither finding the statement(s) 
corresponding to a context nor finding the context associated with a 
statement is obviously easy (each involving a search and value computation 
over the statement database), and scaling may prove problematic.

A related approach is one in which the digest values of individual 
statements are used to construct resource identifiers that can stand for 
the reified statement resources, and which have property arcs indicating 
the applicable context(s) for the corresponding statement.  The above 
example then becomes:

>T --rdf:type-->  SignedDocument
>T --principal--> Alice
>T --algorithm--> RSA
>digest:<hash1> --context--> T
>...
>digest:<hashN> --context--> T
><statement 1>
>...
><statement N>

This approach seems to have the following advantages over the previous 
suggestion:
- given a statement, it is easy to find any applicable contexts (assuming 
access by RDF resource ID is a primitive function of the RDF handling 
engine used),
- this model very naturally extends to full RDF reification -- simply add 
the appropriate rdf:predicate, rdf:subject, rdf:object and rdf:type 
properties to the digest:<hashX> resource.
- this approach more closely matches the context extension to the RDF model 
suggested above

There is still the problem of finding all statements to which a given 
context applies, but, if necessary, this might be overcome by having 
back-links from the context:

>T --rdf:type-->  SignedDocument
>T --principal--> Alice
>T --algorithm--> RSA
>T --statement--> digest:<hash1>
>...
>T --statement--> digest:<hashN>
>digest:<hash1> --context--> T
>...
>digest:<hashN> --context--> T
><statement 1>
>...
><statement N>

which starts to look a bit like Sergey's scheme again.


3. Putting into practice
------------------------

If this is deemed to be a reasonable way of handling contextual information 
in RDF, then for interoperability between applications processing this 
information (and possibly exporting/importing it in an RDF graph 
serialization) I think it would be needed to specify:

(a) a digest algorithm for digesting RDF statements
(b) a URI scheme for embedding digest values in a resource ID
(c) an RDF class to be a subclass of any context class
(d) an RDF property corresponding to --context-->
     (domain: rdf:Statement, range: xxx:Context)
(e) an RDF property corresponding to --statement-->
     (domain: xxx:Context, range: rdf:Statement)

Applications would choose, according to their needs, whether or not to 
include additional information ("partial reification"?) into the graph that 
they construct, so simply defining a standard form for these mechanisms 
would not impose any overhead for applications that do not need to process 
contextual information.

--

So, the question:

Is this worthy of pursuing?

#g

------------
Graham Klyne
(GK@ACM.ORG)

Received on Wednesday, 24 May 2000 11:41:31 UTC