Re: [ANN] RDF Delta : change logging and dataset replication. from Andy Seaborne on 2018-06-18 (semantic-web@w3.org from June 2018)

From: Andy Seaborne <andy@seaborne.org>
Date: Mon, 18 Jun 2018 16:35:52 +0100
To: semantic-web@w3.org
Message-ID: <31dafa49-41b6-fbf5-c070-39e10d9408b7@seaborne.org>
The design goal for RDF Patch to support database changes for any 
pattern of RDF data and any pattern of change to persistent data, via 
API or SPARQL or something else. The base system has to work at scale, 
being able to record changes as they happen and be efficient for 
applying changes.  The changes are not application specific nor 
application-writer provided.

To change the value of one item, the database changes might be two API 
calls, delete one triple and add another one, and that is all the 
information available.  It would take something else to add the context 
for a blank node subject there.

Stian's idea of an extension (and see LDPatch) is interesting because, 
if not used, it does not cost in processing. Maybe it defines a subset 
of SPARQL Update that is idempotent.  Running a sequence of patches from 
a known state is idempotent which makes recovery and catch-up much easier.

     Andy

On 18/06/18 13:46, Stian Soiland-Reyes wrote:
>  From what I get in 
> https://afs.github.io/rdf-delta/rdf-patch.html#blank-nodes it assumes a 
> “system identifier” that survives multiple patches. This is kind of like 
> a I-know-its-a-bnode-and-so-should-you skolemization (but where the end 
> result is still a bnode).
> 
> I see why you raise this, as there would be challenges if you had 
> federated systems that used RDF patches, as you would need to keep track 
> of which ‘system’ a patch picked its identifiers from. Yes, that could 
> go into “H id” as in https://afs.github.io/rdf-delta/rdf-patch-logs.html
> 
> I think a select-pattern-based system that would work with isomorphic 
> graphs would be more general (e.g. such patches could be applied to a 
> variety of stores), but probably harder for an RDF store to generate 
> from a simple transaction log. It could also be more computationally 
> expensive to apply.
> 
> As this is an RDF Patch update we don’t need any kind of selection, just 
> to deal with known triples separately.
> 
> Perhaps it could work by adding an S(elect) operation and E(xists) 
> within a transaction?
> 
> Suggested format:
> 
> TX .
> 
> S _:1 .
> 
> E <http://example.com/person1>  <http://schema.org/Person> .
> 
> E <http://example.com/person1> <http://schema.org/affiliation> _:1 .
> 
> E _:1 <http://schema.org/url> <http://example.com/org1> .
> 
> D _:1 <http://schema.org/name> “Fred’s Fish House” .
> 
> A _:1 <http://schema.org/name> “Fred’s Soup House” .
> 
> TA .
> 
> (Using schema.org as example as it relies a lot on bnodes)
> 
> Here we (S)elect _/:1 /as a blank node ID to be bound within this 
> transaction//(_:1 is no longer a system identifier).
> 
> To restrict which bnode we are talking about, the store would need to 
> match all of the E(xists) statements.  Any non-selected _: identifiers 
> there are NOT free, but are still interpreted as ‘system identifiers’, 
> but you can add multiple S(elections).
> 
> Here the transaction would fail if any of the E’s triples/quads fail to 
> exists, or give multiple bindings for _:1. I don’t think it would be 
> appropriate for RDF Patch format to do wildcard bnode selections, e.g. 
> “Delete all bnodes that are organizations..”.
> 
> It is not a requirement that every selected bnode is used in A/D, 
> although it would be silly if none of them were used. (This permits you 
> do use intermediate bnodes in the E selection)
> 
> It would have to be inside a transaction because such patches are not 
> necessarily idempotent, e.g. the A/D operations might be doing something 
> that breaks the E query and so you can’t run it again.
> 
> My proposal would presumably be fairly simple to translate to SPARQL 
> updates.
> 
> -- 
> Stian Soiland-Reyes, eScience Lab
> School of Computer Science, The University of Manchester
> http://orcid.org/0000-0001-9842-9718
> 
> *From: *Reto Gmür <mailto:reto@factsmission.com>
> *Sent: *18 June 2018 06:55
> *To: *Andy Seaborne <mailto:andy@apache.org>; Semantic Web 
> <mailto:semantic-web@w3.org>
> *Subject: *RE: [ANN] RDF Delta : change logging and dataset replication.
> 
> Hi Andy
> 
> I'm curious: does this system rely on persistent blanknode ids or can it 
> generate SPARQL Update statements that can be applied to any isomorphic 
> graph?
> 
> Cheers,
> Reto
> 
>> -----Original Message-----
>> From: Andy Seaborne <andy@apache.org>
>> Sent: Friday, June 15, 2018 6:36 PM
>> To: Semantic Web <semantic-web@w3.org>
>> Subject: [ANN] RDF Delta : change logging and dataset replication.
>> 
>> RDF Delta is a system for recording and publishing changes to RDF Datasets. It
>> can be used to create replicas.
>> 
>> https://afs.github.io/rdf-delta/
>> 
>> It is built on top patches and logs which record the changes made to the data.
>> 
>> https://afs.github.io/rdf-delta/rdf-patch.html
>> 
>> One use case is running multiple sync'ed Apache Jena Fuseki servers, for high
>> availability or for a request-scalable publishing solution:
>> 
>> https://afs.github.io/rdf-delta/ha-fuseki.html
>> 
>> The current version is 0.4.0.
>> 
>>      Andy
>
Received on Monday, 18 June 2018 15:36:31 UTC