rdf-utils to diff and leanify rdf graphs

rdf-utils to diff and leanify rdf graphs [uri-1]
Reto Bachmann-Gmür [uri-2] 2005-12-22 21:41
I've just uploaded rdf-utils to sourceforge (here [uri-3]), this is a utility tool for dealing with rdf data, it currently has two feature:

Leanify: Remove redundant statements (and anonymous nodes) from rdf-graphsDiff: Show the difference between to rdf-graphsThe need for such a tool arose when developing on KnoBot and wondering why the model is getting bigger and bigger. This is kind of a follow up to the thread "RDF-Entailment: Remove duplicate anonymous resources - looking for an algorithm" [uri-4], in which I particualrly Joshua Tauberer and Yuzhong Qu helped me understand "leanification".


The tool is to be used on the command line (or by api-call, look at the source).

The leanify option is used like this:

java -jar rdf-utils-compact.jar leanify -M test.rdf

This outputs a leanified version of test.rdf. Of interest may be the optional parameter -O with allows to specify an ontology used to find (inverse) functional properties, by default some foaf and skos properties are assumed to be fp/ifp (option -D to disable). Another parameter is -P or --pedantic this disables the rdf-molecules based approach and should produce completely lean graphs, but it may take years to complete for a medium-size graph. The full list of options id available with

java -jar rdf-utils-compact.jar leanify -H

 The diff option is used like this:

java -jar rdf-utils-compact.jar diff -M1 test1.rdf -M2 test2.rdf

This output the differences between the two model in a human readable form (the next release should come with a human friendly output as well as a computer friendly output to allow a 'patch' command). The first part of the output is about "functionally grounded nodes" this are anonymous nodes with an identity defined by (inverse) functional property, the fg-nodes present in only one of the graphs are shown with their respective (inverse) functional properties, if the same resource have partially different (inverse) functional properties a "CrossGraphFgNode" is described. In the second part molecules available only in one of the models are shown (except the molecules with are 'part' of the fg-nodes and thus have been already shown.

An example looks like:

Cross-Graph FG-Nodes: 1
 - CrossGraphFgNode, that will be referenced as _:cgn-onnkgtfo
 Versions in 1: 2
 -{x <http://xmlns.com/foaf/0.1/homepage> <http://gmuer.ch/>
 x <http://xmlns.com/foaf/0.1/mbox> <mailto:yahoo@gmuer.ch>}
 -{x <http://xmlns.com/foaf/0.1/mbox_sha1sum> "63267630b67d56a6fca96d01bfc324d7e0a31df1"
 <http://localhost:8585/me> <http://xmlns.com/foaf/0.1/primaryTopic> x
 x <http://xmlns.com/foaf/0.1/isPrimaryTopicOf> <http://localhost:8585/me>
 x <http://xmlns.com/foaf/0.1/mbox> <mailto:reto@gmuer.ch>}

 Versions in 2: 1
 -{x <http://xmlns.com/foaf/0.1/mbox> <mailto:yahoo@gmuer.ch>
 <http://localhost:8585/me> <http://xmlns.com/foaf/0.1/primaryTopic> x
 x <http://xmlns.com/foaf/0.1/mbox> <mailto:reto@gmuer.ch>}


 Functionally grounded nodes only in 1: 0

 Functionally grounded nodes only in 2: 1
 -{x <http://xmlns.com/foaf/0.1/mbox> <mailto:jo@example.org>}
 Molecules only in 1: 12
 -[_:cgn-onnkgtfo <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wymiwyg.org/ontologies/virtuser#TemporarySubject>.]
 -[_:cgn-onnkgtfo <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Agent>.]
 -[-31893d66:108545c99a1:-7ffe <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <http://localhost:8585/>., -31893d66:108545c99a1:-7ffe <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq>., _:cgn-onnkgtfo <http://wymiwyg.org/ontologies/knobot/personal-history#personalHistory> -31893d66:108545c99a1:-7ffe.]
 -[_:cgn-onnkgtfo <http://xmlns.com/foaf/0.1/givenname> "Reto".]
 -[_:cgn-onnkgtfo <http://wymiwyg.org/ontologies/authorization#shortName> "reto".]
 -[_:cgn-onnkgtfo <http://xmlns.com/foaf/0.1/name> "Reto Bachmann-Gmuer".]
 -[_:cgn-onnkgtfo <http://xmlns.com/foaf/0.1/family_name> "Bachmann-Gmuer".]
 -[_:cgn-onnkgtfo <http://wymiwyg.org/ontologies/authorization#pass_sha1sum> "6fd0b9ba50273caac39d1335073f1046d7382647".]
 -[_:cgn-onnkgtfo <http://wymiwyg.org/ontologies/authorization#permission> <http://wymiwyg.org/ontologies/authorization#mark>.]
 -[_:cgn-onnkgtfo <http://wymiwyg.org/ontologies/authorization#permission> <http://wymiwyg.org/ontologies/authorization#admin>.]
 -[_:cgn-onnkgtfo <http://purl.org/dc/elements/1.1/date> "2005-12-20T21:58+0100".]
 -[_:cgn-onnkgtfo <http://wymiwyg.org/ontologies/authorization#permission> <http://wymiwyg.org/ontologies/authorization#edit>.]
 Molecules only in 2: 12
 -[<http://localhost:8585/2005/12/21/an-article> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wymiwyg.org/ontologies/rwcf#AuthoritativelyServedResource>.]
 -[<http://localhost:8585/2005/12/21/an-article> <http://purl.org/dc/elements/1.1/creator> "@en".]
 -[<http://localhost:8585/2005/12/21/an-article> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wymiwyg.org/ontologies/knobot#Commentable>.]
 -[<http://localhost:8585/2005/12/21/an-article> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/rss/1.0/item>.]
 -[<http://localhost:8585/2005/12/21/an-article> <http://xmlns.com/foaf/0.1/maker> _:cgn-onnkgtfo.]
 -[<http://localhost:8585/2005/12/21/an-article> <http://purl.org/rss/1.0/modules/content/encoded> "Shoud write something here too...<br/>@en".]
 -[<http://localhost:8585/2005/12/21/an-article> <http://purl.org/rss/1.0/title> "An article@en".]
 -[{x <http://xmlns.com/foaf/0.1/mbox> <mailto:jo@example.org>} <http://xmlns.com/foaf/0.1/name> "Jo Example".]
 -[-31893d66:108545c99a1:-7fe9 <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <http://localhost:8585/>., -31893d66:108545c99a1:-7fe9 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq>.]
 -[<http://localhost:8585/2005/12/21/an-article> <http://purl.org/dc/elements/1.1/language> "en".]
 -[<http://localhost:8585/2005/12/21/an-article> <http://purl.org/dc/elements/1.1/date> "2005-12-21T20:25+01:00".]
 -[<http://localhost:8585/> <http://wymiwyg.org/ontologies/knobot#firstRelation> -31893d66:108545c99a1:-7ff4., -31893d66:108545c99a1:-7ff4 <http://wymiwyg.org/ontologies/knobot#source> <http://localhost:8585/>., -31893d66:108545c99a1:-7ff4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wymiwyg.org/ontologies/knobot#InlineRelation>., -31893d66:108545c99a1:-7ff4 <http://wymiwyg.org/ontologies/knobot#strength> "1.0"., -31893d66:108545c99a1:-7ff4 <http://wymiwyg.org/ontologies/knobot#target> <http://localhost:8585/2005/12/21/an-article>., -31893d66:108545c99a1:-7ff4 <http://wymiwyg.org/ontologies/knobot#effectiveDate> "1135193154145"., -31893d66:108545c99a1:-7ff4 <http://wymiwyg.org/ontologies/knobot#strengthReduction> "0.01"., -31893d66:108545c99a1:-7ff4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wymiwyg.org/ontologies/knobot#Relation>.]

Note that a CrossGraphFgNode is given an ID (_:cgn-onnkgtfo) used in other shown molecules, while other fg-nodes are shown in the form "{x <http://xmlns.com/foaf/0.1/mbox> <mailto:jo@example.org>}" in the molecule. The information shown in the diff should be enough to create a graph equivalent to one of the compared graphs having the other.

The approach is based on the concept of RDF Molecules [uri-5], slightly modified so that functionally grounded nodes reference to all their grounding nt-molecules and that these references are contained in the terminal and in the maximum contextual molecules (rather than the statements of one of the grounding nt-molecules).






Links:
 [uri-1] http://wymiwyg.org/2005/12/22/announicing-rdf-utils
 [uri-2] http://gmuer.ch/me
 [uri-3] http://sourceforge.net/project/showfiles.php?group_id=83223&package_id=173731&release_id=380220
 [uri-4] http://lists.w3.org/Archives/Public/semantic-web/2005Nov/0086.html
 [uri-5] http://www.ksl.stanford.edu/people/pp/papers/Ding_ISWC_2005.pdf

Received on Thursday, 22 December 2005 22:19:10 UTC