Scale issues : streaming, latency, whole request validation from Andy Seaborne on 2013-09-08 (public-ldp-patch@w3.org from September 2013)

From: Andy Seaborne <andy@apache.org>
Date: Sun, 08 Sep 2013 11:23:35 +0100
To: public-ldp-patch@w3.org
Message-ID: <522C5027.7030603@apache.org>

Streaming is valuable at a scale because changes can start to be applied 
as the data arrive, rather than buffering the changes until the end of 
change is seen, then applying the changes.  For large changes, this also 
impacts latency. Doing some or all of the changes as data arrives gets 
an overlap in processing between sender and receiver.

Several proposals need the complete patch request to be seen before it 
can start to be processed.

Any format (e.g Talis ChangeSets) that is RDF or TriG can't make any 
assumptions of the order of triples received. In practice, a changeset 
must be parsed into memory (standard parser), validated (patch format 
specific code) and applied (patch format specific code).  There is some 
reuse of a common parser but validation has to be done on top.

These are limitation at scales, where scale means most or or more than 
available RAM.

This may be acceptable - for any format that is a restriction of SPARQL 
it maybe desirable to check the whole request is in the required subset 
before proceeding with changes (e.g. no true transaction abort available).

The bnodes issue and the scalability are what motivated:

http://afs.github.io/rdf-patch/

 Andy

Received on Sunday, 8 September 2013 10:24:03 UTC