ANNOUNCEMENT: RDF Context Tools from Giovanni Tummarello on 2005-05-30 (semantic-web@w3.org from May 2005)

From: Giovanni Tummarello <giovanni@wup.it>
Date: Mon, 30 May 2005 22:21:27 +0200
To: semantic-web@w3.org, semanticweb@yahoogroups.com
Message-ID: <429B75C7.4000003@wup.it>
********************
RDFContext Tools 0.1
http://www.dbin.org/RDFContextTools.php

27/5/2005
By Giovanni Tummarello, Christian Morbidoni
http://semedia.deit.univpm.it

Part of the DBin project
http://www.dbin.org

INTRODUCTION
********************

This API gives a way to attach "context" information to pieces of an RDF 
model by adding triples to the model itself. This is similar to 
reification but at a different, coarser, level.

These tools use the concept of MSG (Minimal Self-contained Graph) [1].
Given a triple, the MSG that contains it is composed by that triple 
plus, recursively, for each blank node involved all the triples 
connected to it. An MSG therefore has a boundary consisting entirely of 
URIs or literals. An MSG is also the minimum "piece" of an RDF graph 
that can be can transferred to another peer that still allows the 
original graph to be incrementally reconstructed.

One of the nice properties of MSGs is that they uniquely partition an 
RDF Graph, independently of which triple you begin decomposing the graph 
from. This (actually very simple) theory allows a number of interesting 
"operations", most of which are directly supported by this API:

* Digital signatures on pieces of RDF Graphs (MSGs) stored in the graph 
itself

This is useful in scenarios where "bits" of information are requested 
remotely and it is desirable to merge them with existing RDF while 
retaining knowledge about who said what. This is even more useful in 
scenarios where information can be served by peers other than the 
original author, possibly "information collectors" or aggregators: 
signatures on an MSG will be verifiable independently of the 
pre-existing content of the graph the MSG is merged into.
* Address groups of statements without quoting them explicitly

Given a hash function, MSGs can be deterministically hashed once 
property canonicalized (an implementation of Carroll's canonicalization 
procedure is implemented in this lib). This is useful for several 
purposes among which Revocation without quotation, RDF Based challenge 
response operations (e.g. will communicate only if already know ... ) 
and possibly, efficient RDF RSynching.
* Generic "context" to pieces of the graph

This is a generalization of the use of Digital Signatures on MSGs. The 
same node to which signatures are attached can in general be used to 
apply other contextual information like authorship, date, color, 
temperature etc.

Issues:

1) Since this methodology uses reifications as a way to attach the 
signature to the MSGs, it is subject to the issues typical of this 
standard RDF construct. In particular, care should be used when using 
this proposed method in OWL full reasoners as the owl:sameAs property 
might cause substitutions inside MSGs. RDFS inference presents similar 
problems, as new triples resulting from schema entailments could be 
automatically added by the RDFS triple store involving blank nodes (thus 
usually invalidating the signatures on MSG). Since RDFS reasoning is 
usually needed, we differentiate in the DBin platform between the 
repository where "raw" data is exchanged and those where reasoning 
happens. At P2P level a "raw" repository is used, where MSGs are stored 
and served unchanged if requested. Based on the use of the MSG 
signatures, contexts and specific local rules, the application will then 
decide whether to also merge the raw MSGs into the higher level 
repositories, e.g. those used for RDFS, OWL and/or rules reasoning).

2) By MSG definition and RDF Semantics, the structure of existing MSGs 
will not be affected by insertion of new ones. While this property 
enables our RDF digital signature schema, care must be taken when 
inserting the same MSG twice. Although RDF Semantics states that parts 
of the graph which have identical interpretations should not be 
duplicated (thanks J. Carroll for pointing this out!), existing toolkits 
will usually duplicate the MSG when inserting it twice. The MSG class 
implements a hasSemanticsOf(MSG) which is useful to avoid this. A simple 
(but not the most efficient) example to prevent duplicate MSGs from 
being inserted into the triple store:

|
		Graph ourGraph = db.getGraph();
		RDFN ourRDFN = new RDFN(ourGraph, URI INSIDE THE INCOMING MSG); 
		MSG[] ourMSGs = ourRDFN.getComposingMSGs();
		
		boolean merge = true;	
		for (int j = 0; j < ourMSGs.length; j++) {
			if ((incomingMSGs.hasSemanticsOf(ourMSGs[j]))) {
				merge = false;
			}
		}
		if (merge) {
			db.addXMLRDF(incomingMSGs[i].getRDFXML());	
		}				
|


Note that this example uses the RDFN concept (all the MSGs surrounding a 
given URI).

3) Triple overhead.

Worse case: In case of ground statements, the MSG "context node" is 
actually the reification node. This means that for each triple that is 
signed at least 4 (reification, but could be done with 3)+2 
(certificate+hash) tripes will be added.

Typical case: In DBin, a typical MSG is composed by more than 20 
triples, this leads to an overhead of 25% to 30%.

For a better explanation of the definitions, properties and issues, 
please see:

[1] G. Tummarello, C. Morbidoni, P. Puliti, F. Piazza, "RDF signing 
supporting resource centric requests" Proceedings of the Poster track, 
ESWC 2005.
http://semedia.deit.univpm.it/submissions/ESWC2005_Poster/ESWC2005_signignRDF.pdf
RELEASE NOTES
********************

This is release 0.1, the code is to be considered beta and/or 
experimental. It is used inside DBin where, so far, it appears to be 
doing a good job.
RDFContext Tools requires the following libraries in classpath

Jena Framework and related:
commons-logging.jar
icu4j.jar
jakarta-oro-2.0.5.jar
jena.jar
xercesImpl.jar

Log4j:
log4j-1.2.8.jar

BouncyCastle APIs:
bcpg-jdk14-125.jar
bcprov-jdk14-125.jar

All of these are included in the release file (RDFTrusttoolkit/lib), 
except the Jena and related which are available in a single file 
(jenaLibs.zip) at dbin.org -> RDFContextTools
Place these jars in the RDFTrusttoolkit/lib directory to able to run the 
examples.

Running the sample code:
---------------------------
In the "samplecode" folder there are two basic examples illustrating the 
API. You can run them using the .bat files or the equivalent command line.

SigningMSG
---------------
Shows how to create an MSG object from a graph where a digital signature 
already exists. Once the MSG is created, the signature is checked and 
another one is attached to it.

RevokingMSG
----------------
By means of the signature process it is possible to remotely "revoke" an 
MSG that, for example, was previously issued to another peer. This 
example uses the signature hash value of the MSG as inverse functional 
property to find the MSG to be revoked. The revocation itself is a 
digitally signed MSG containing the hash value of the MSG to be revoked. 
The revocation policy implemented in this example is: "remove the MSG if 
the same public keys are used to sign both the revocation and the MSG 
itself" (only the author can revoke his/her annotations). More 
sophisticated policies can be simply implemented (revocation might come 
from "groups moderators" etc.)

LICENCE
----------------
This library is distributed under the terms of the LGPL licence 
<http://www.opensource.org/licenses/lgpl-license.php>.

Acknowledgments
----------------
Gratitude goes to Fabio Panaioli for part of the implementation and to 
J. Carroll for the suggestions.
Received on Monday, 30 May 2005 20:22:38 UTC