- From: Reto Bachmann-Gmür <reto@gmuer.ch>
- Date: Thu, 22 Dec 2005 23:00:15 +0100 (CET)
- To: semantic-web@w3.org
- Message-ID: <26966228.1135288815562.JavaMail.knobot@gonza>
rdf-utils to diff and leanify rdf graphs [uri-1] Reto Bachmann-Gmür [uri-2] 2005-12-22 21:41 I've just uploaded rdf-utils to sourceforge (here [uri-3]), this is a utility tool for dealing with rdf data, it currently has two feature: Leanify: Remove redundant statements (and anonymous nodes) from rdf-graphsDiff: Show the difference between to rdf-graphsThe need for such a tool arose when developing on KnoBot and wondering why the model is getting bigger and bigger. This is kind of a follow up to the thread "RDF-Entailment: Remove duplicate anonymous resources - looking for an algorithm" [uri-4], in which I particualrly Joshua Tauberer and Yuzhong Qu helped me understand "leanification". The tool is to be used on the command line (or by api-call, look at the source). The leanify option is used like this: java -jar rdf-utils-compact.jar leanify -M test.rdf This outputs a leanified version of test.rdf. Of interest may be the optional parameter -O with allows to specify an ontology used to find (inverse) functional properties, by default some foaf and skos properties are assumed to be fp/ifp (option -D to disable). Another parameter is -P or --pedantic this disables the rdf-molecules based approach and should produce completely lean graphs, but it may take years to complete for a medium-size graph. The full list of options id available with java -jar rdf-utils-compact.jar leanify -H The diff option is used like this: java -jar rdf-utils-compact.jar diff -M1 test1.rdf -M2 test2.rdf This output the differences between the two model in a human readable form (the next release should come with a human friendly output as well as a computer friendly output to allow a 'patch' command). The first part of the output is about "functionally grounded nodes" this are anonymous nodes with an identity defined by (inverse) functional property, the fg-nodes present in only one of the graphs are shown with their respective (inverse) functional properties, if the same resource have partially different (inverse) functional properties a "CrossGraphFgNode" is described. In the second part molecules available only in one of the models are shown (except the molecules with are 'part' of the fg-nodes and thus have been already shown. An example looks like: Cross-Graph FG-Nodes: 1 - CrossGraphFgNode, that will be referenced as _:cgn-onnkgtfo Versions in 1: 2 -{x <http://xmlns.com/foaf/0.1/homepage> <http://gmuer.ch/> x <http://xmlns.com/foaf/0.1/mbox> <mailto:yahoo@gmuer.ch>} -{x <http://xmlns.com/foaf/0.1/mbox_sha1sum> "63267630b67d56a6fca96d01bfc324d7e0a31df1" <http://localhost:8585/me> <http://xmlns.com/foaf/0.1/primaryTopic> x x <http://xmlns.com/foaf/0.1/isPrimaryTopicOf> <http://localhost:8585/me> x <http://xmlns.com/foaf/0.1/mbox> <mailto:reto@gmuer.ch>} Versions in 2: 1 -{x <http://xmlns.com/foaf/0.1/mbox> <mailto:yahoo@gmuer.ch> <http://localhost:8585/me> <http://xmlns.com/foaf/0.1/primaryTopic> x x <http://xmlns.com/foaf/0.1/mbox> <mailto:reto@gmuer.ch>} Functionally grounded nodes only in 1: 0 Functionally grounded nodes only in 2: 1 -{x <http://xmlns.com/foaf/0.1/mbox> <mailto:jo@example.org>} Molecules only in 1: 12 -[_:cgn-onnkgtfo <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wymiwyg.org/ontologies/virtuser#TemporarySubject>.] -[_:cgn-onnkgtfo <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Agent>.] -[-31893d66:108545c99a1:-7ffe <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <http://localhost:8585/>., -31893d66:108545c99a1:-7ffe <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq>., _:cgn-onnkgtfo <http://wymiwyg.org/ontologies/knobot/personal-history#personalHistory> -31893d66:108545c99a1:-7ffe.] -[_:cgn-onnkgtfo <http://xmlns.com/foaf/0.1/givenname> "Reto".] -[_:cgn-onnkgtfo <http://wymiwyg.org/ontologies/authorization#shortName> "reto".] -[_:cgn-onnkgtfo <http://xmlns.com/foaf/0.1/name> "Reto Bachmann-Gmuer".] -[_:cgn-onnkgtfo <http://xmlns.com/foaf/0.1/family_name> "Bachmann-Gmuer".] -[_:cgn-onnkgtfo <http://wymiwyg.org/ontologies/authorization#pass_sha1sum> "6fd0b9ba50273caac39d1335073f1046d7382647".] -[_:cgn-onnkgtfo <http://wymiwyg.org/ontologies/authorization#permission> <http://wymiwyg.org/ontologies/authorization#mark>.] -[_:cgn-onnkgtfo <http://wymiwyg.org/ontologies/authorization#permission> <http://wymiwyg.org/ontologies/authorization#admin>.] -[_:cgn-onnkgtfo <http://purl.org/dc/elements/1.1/date> "2005-12-20T21:58+0100".] -[_:cgn-onnkgtfo <http://wymiwyg.org/ontologies/authorization#permission> <http://wymiwyg.org/ontologies/authorization#edit>.] Molecules only in 2: 12 -[<http://localhost:8585/2005/12/21/an-article> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wymiwyg.org/ontologies/rwcf#AuthoritativelyServedResource>.] -[<http://localhost:8585/2005/12/21/an-article> <http://purl.org/dc/elements/1.1/creator> "@en".] -[<http://localhost:8585/2005/12/21/an-article> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wymiwyg.org/ontologies/knobot#Commentable>.] -[<http://localhost:8585/2005/12/21/an-article> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/rss/1.0/item>.] -[<http://localhost:8585/2005/12/21/an-article> <http://xmlns.com/foaf/0.1/maker> _:cgn-onnkgtfo.] -[<http://localhost:8585/2005/12/21/an-article> <http://purl.org/rss/1.0/modules/content/encoded> "Shoud write something here too...<br/>@en".] -[<http://localhost:8585/2005/12/21/an-article> <http://purl.org/rss/1.0/title> "An article@en".] -[{x <http://xmlns.com/foaf/0.1/mbox> <mailto:jo@example.org>} <http://xmlns.com/foaf/0.1/name> "Jo Example".] -[-31893d66:108545c99a1:-7fe9 <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <http://localhost:8585/>., -31893d66:108545c99a1:-7fe9 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq>.] -[<http://localhost:8585/2005/12/21/an-article> <http://purl.org/dc/elements/1.1/language> "en".] -[<http://localhost:8585/2005/12/21/an-article> <http://purl.org/dc/elements/1.1/date> "2005-12-21T20:25+01:00".] -[<http://localhost:8585/> <http://wymiwyg.org/ontologies/knobot#firstRelation> -31893d66:108545c99a1:-7ff4., -31893d66:108545c99a1:-7ff4 <http://wymiwyg.org/ontologies/knobot#source> <http://localhost:8585/>., -31893d66:108545c99a1:-7ff4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wymiwyg.org/ontologies/knobot#InlineRelation>., -31893d66:108545c99a1:-7ff4 <http://wymiwyg.org/ontologies/knobot#strength> "1.0"., -31893d66:108545c99a1:-7ff4 <http://wymiwyg.org/ontologies/knobot#target> <http://localhost:8585/2005/12/21/an-article>., -31893d66:108545c99a1:-7ff4 <http://wymiwyg.org/ontologies/knobot#effectiveDate> "1135193154145"., -31893d66:108545c99a1:-7ff4 <http://wymiwyg.org/ontologies/knobot#strengthReduction> "0.01"., -31893d66:108545c99a1:-7ff4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wymiwyg.org/ontologies/knobot#Relation>.] Note that a CrossGraphFgNode is given an ID (_:cgn-onnkgtfo) used in other shown molecules, while other fg-nodes are shown in the form "{x <http://xmlns.com/foaf/0.1/mbox> <mailto:jo@example.org>}" in the molecule. The information shown in the diff should be enough to create a graph equivalent to one of the compared graphs having the other. The approach is based on the concept of RDF Molecules [uri-5], slightly modified so that functionally grounded nodes reference to all their grounding nt-molecules and that these references are contained in the terminal and in the maximum contextual molecules (rather than the statements of one of the grounding nt-molecules). Links: [uri-1] http://wymiwyg.org/2005/12/22/announicing-rdf-utils [uri-2] http://gmuer.ch/me [uri-3] http://sourceforge.net/project/showfiles.php?group_id=83223&package_id=173731&release_id=380220 [uri-4] http://lists.w3.org/Archives/Public/semantic-web/2005Nov/0086.html [uri-5] http://www.ksl.stanford.edu/people/pp/papers/Ding_ISWC_2005.pdf
Received on Thursday, 22 December 2005 22:19:10 UTC