Re: Support for Multiple Graphs and Graph Stores

On 2/3/2012 12:55 PM, David Robillard wrote:
> On Fri, 2012-02-03 at 12:25 -0800, michael wrote:
> [...]
>> Some revision control terms have already been mentioned by the WG: branches[6][7], trees[7], patches[8][9] and assertions[4][10]. Here is a comparison
>> of Sandro's g-* terminology[11] with a popular DVCS, Git and some terms I suggest for RDF graph management(RDFGM):
>>
>> g-*     Git         RDFGM          Description
>> ----------------------------------------------------------------------------
>> g-text  patch       graph literal  serialized set of RDF statements, triples
>> g-snap  blob        Graph          set of RDF statements
>>           tree        Dataset        description of one or more graphs/datasets
>>           commit      Assertion      provenance for a dataset
>> g-box   branch      Branch         dataset of assertions / label for assertions
>>           repository  Repository     set of graphs and their metadata
>>           git         Store          an engine that provides access to repositories
>>
>> A g-text is the serialized content of a RDF graph[11], aka triples. This is similar to a patch in a revision control system. I prefer the term graph
>> literal which is a more accurate description.
>
> I don't think the similarity you are drawing between "graph literal" and
> "patch" here is valid.  This does not agree with the extremely
> well-established meaning of "patch" (from the ubiquitous UNIX utility
> around since the 80's).  A patch (sometimes "diff") is an applicable
> description of changes between one thing and another.  The two documents
> you cite[1][2] use this meaning.  Another example is PROPPATCH from
> WebDAV[3].  There are many definitions depending on context, but "patch"
> invariably refers to a *change* somehow.
>
> A "commit" can be expressed as a patch relative to a previous commit.
>
> A "patch" for RDF is not the same as a graph literal, a patch for RDF
> would be vaguely similar to PROPPATCH and require at least *two* graphs:
> the set of triples removed, and the set of triples added.
>
> Git's internal model does not map well to the user perspective, in
> particular a "patch" is inherently a minimal description of changes
> between things, and not a complete description of the new version (e.g.
> a patch changing one triple in a billion triples graph would be a few
> triples large, not a billion triples).  I think you mean the latter,
> i.e. the complete description of the new version, which is fine, but you
> shouldn't equate that to "patch", it is very confusing to do so.

I agree that a patch/diff implies change whereas a graph literal does not. This is a good observation that a patch/diff must specify both additions 
and deletions. I did not intend to equate the complete description of the new version with a patch. The analogy I was trying to draw was that they 
both exist at the level of serialized statements.

Perhaps a better comparison would be to a file and introduce a patch(Diff) as its own separate entity:

g-*     Git         RDFGM          Description
----------------------------------------------------------------------------
g-text  file        graph literal  serialized set of RDF statements, triples
g-snap  blob        Graph          set of RDF statements
         tree        Dataset        description of one or more graphs/datasets
         patch       Diff           a changeset
         commit      Assertion      provenance for a dataset/patch
g-box   branch      Branch         dataset of assertions / label for assertions
         repository  Repository     set of graphs and their metadata
         git         Store          an engine that provides access to repositories
...
Diff's allow for small changes to be easily expressed. They describe three, possibly four, datasets/graphs: 1) a base dataset/graph 2) additions 3) 
deletions. Optionally, the diff may also describe a fourth dataset/graph that is the result of the changes. An assertion can then refer to the diff to 
describe the provenance for that limited set of changes, allowing for more fine grained tracking of provenance data. Diffs, as with graphs and 
datasets, are also immutable.
...

Thanks for pointing that out!


-Michael

Received on Friday, 3 February 2012 22:01:20 UTC