Re: blank nodes (once again) from Reto Bachmann-Gmür on 2011-03-15 (semantic-web@w3.org from March 2011)

From: Reto Bachmann-Gmür <reto@gmuer.ch>
Date: Tue, 15 Mar 2011 18:49:44 +0100
To: Ivan Shmakov <oneingray@gmail.com>
Cc: Ivan Shmakov <ivan@main.uusia.org>, semantic-web@w3.org
Message-ID: <AANLkTimwz7BF4nVYEG5v_5z3iOqyMZPhHnMRV+4HjT9P@mail.gmail.com>

On Tue, Mar 15, 2011 at 5:30 PM, Ivan Shmakov <ivan@main.uusia.org> wrote:

>
>        Treating this particular issue doesn't seem overly difficult.
>        First, we may split the source graph into subgraphs so that each
>        of the subgraphs will contain only those blank nodes that are
>        linked to each other only via arcs and other blank nodes, and
>        only those non-blank nodes that are exactly one arc away from
>        any of the blank nodes contained.  (A procedure similar to the
>        one described in CBD.)
>
>        Then, serializing the resulting subgraphs using a kind of
>        “canonical” representation (which, in particular, imposes an
>        ordered and stable blank nodes naming), and hashing them, the
>        duplicate statements and whole subgraphs could be detected and
>        removed from the store.
>
This procedure doesn't guarantee that the result graph is lean. As an MSG
might be a subgraph of another.

E.g.:

given the graph:

G1: :joe :has [ a :dog ].

and
G2: :joe :has [a :dog; :name "Silvio"].

Just by not duplicating MSGs this would result in

:joe :has [a :dog; :name "Silvio"]; :has [ a :dog ].

which doesn't expresses more than

:joe :has [a :dog; :name "Silvio"].

A couple of years ago I implemented an open source Graph Versioning System
which is based on graph decomposition and thus very similar to what you
describe above, it is still available at
http://sourceforge.net/projects/jena/files/Graph%20Versioning%20System%20GVS/,
the sources are here: http://jena.svn.sourceforge.net/viewvc/jena/gvs/

Cheers,
Reto

Received on Tuesday, 15 March 2011 17:50:16 UTC