distribution and federation

Hello everyone,

In the matrix room of Solid/Specification, elf Pavlik shared recently 2 
links that could be of interest to the topic of CRDT, or at least, to my 
understanding, about distribution and federation of RDF data.

https://treecg.github.io/specification/ 
<https://treecg.github.io/specification/>

https://tree.linkeddatafragments.org/linked-data-event-streams/ 
<https://tree.linkeddatafragments.org/linked-data-event-streams/>

I'll give here below my understanding of those 2 specs.

We have the chance to have among us here Hala Skaf-Molli who took part 
in the 2 research papers I mention further down.

This "TREE" spec is amazing, and I have been looking for something like 
that for many years!

This spec, to my understanding, is about sharding and distribution of 
data in a complex network of data repositories, with a capability to 
search datasets with some parameters. It is very useful when data is 
distributed. The Linked data event streams spec (LDES) which is from the 
same people and relates to the TREE spec, supports an append-only 
collection of immutable records. We can see in the examples that they 
use the concept of `versions` of the records that supersedes each other 
in the stream, if needed.
Also if the stream definition itself (the shape by example) needs to 
change, they have a note saying in the specs that the new shape should 
be backward compatible, or that a fork is needed.

Nowhere in those 2 specs is the concept of "merging conflicts" present. 
They elude the question of conflict, and I suppose, based their conflict 
resolution on the timestamps that I see everywhere in the given 
examples. Which makes it a LWW (last write wins)... which is the poorest 
guarantee you can get, and does not really qualify, in my opinion, for a 
CRDT.
But the spec is really interesting about sharding and distribution of data.

In fact it could complement the work done by Pascal and Hala Molli et 
al., on the problem of source selection and federated queries, that they 
addressed recently with DeKaloG 
https://hal.science/hal-03936036/document and FedUP 
https://hal.science/hal-04538238/document


Those topics are of high importance when we want to consider scalability 
and global search in a decentralized system.

CRDT is about automatic conflict resolution, which is a related topic to 
federation and distribution, but is essentially different too, as it 
concerns updates and their consistency, while what we see here is more 
concerned about read, search and discoverability patterns.

Received on Thursday, 21 November 2024 01:31:25 UTC