- From: Reto Bachmann-Gmür <reto@gmuer.ch>
- Date: Tue, 15 Mar 2011 18:49:44 +0100
- To: Ivan Shmakov <oneingray@gmail.com>
- Cc: Ivan Shmakov <ivan@main.uusia.org>, semantic-web@w3.org
- Message-ID: <AANLkTimwz7BF4nVYEG5v_5z3iOqyMZPhHnMRV+4HjT9P@mail.gmail.com>
On Tue, Mar 15, 2011 at 5:30 PM, Ivan Shmakov <ivan@main.uusia.org> wrote: > > Treating this particular issue doesn't seem overly difficult. > First, we may split the source graph into subgraphs so that each > of the subgraphs will contain only those blank nodes that are > linked to each other only via arcs and other blank nodes, and > only those non-blank nodes that are exactly one arc away from > any of the blank nodes contained. (A procedure similar to the > one described in CBD.) > > Then, serializing the resulting subgraphs using a kind of > “canonical” representation (which, in particular, imposes an > ordered and stable blank nodes naming), and hashing them, the > duplicate statements and whole subgraphs could be detected and > removed from the store. > This procedure doesn't guarantee that the result graph is lean. As an MSG might be a subgraph of another. E.g.: given the graph: G1: :joe :has [ a :dog ]. and G2: :joe :has [a :dog; :name "Silvio"]. Just by not duplicating MSGs this would result in :joe :has [a :dog; :name "Silvio"]; :has [ a :dog ]. which doesn't expresses more than :joe :has [a :dog; :name "Silvio"]. A couple of years ago I implemented an open source Graph Versioning System which is based on graph decomposition and thus very similar to what you describe above, it is still available at http://sourceforge.net/projects/jena/files/Graph%20Versioning%20System%20GVS/, the sources are here: http://jena.svn.sourceforge.net/viewvc/jena/gvs/ Cheers, Reto
Received on Tuesday, 15 March 2011 17:50:16 UTC