Introduction (was: Re: distribution and federation) from Pieter Colpaert on 2024-11-21 (public-crdt4rdf@w3.org from November 2024)

From: Pieter Colpaert <pieter.colpaert@ugent.be>
Date: Thu, 21 Nov 2024 09:55:13 +0100
To: public-crdt4rdf@w3.org
Message-ID: <1021ea61-f11d-444a-8c92-dcfe5761dd71@ugent.be>
Hi all!

I think this is the right moment then to introduce myself. I’ve been the 
editor of the W3C TREE CG draft reports for the past 3 years, and I’m 
the chair of the LDES working group at SEMIC (the same EU community that 
is behind DCAT-AP).

I think what has been said so far about TREE and LDES on the matrix 
channel and in the thread below is accurate: TREE has been built as a 
spec to index and search through entity descriptions. LDES then again is 
built on top of TREE as an append-only collection of items, which of 
course is important for live datasets. Thanks to this concept we can 
also introduce transparent retention policies: a EventStream will 
conceptually always have all data, but a view may only publish the 
latest version (which at this moment is indeed the last write will 
always win) or may only publish “the last month” of updates.

The last update always wins worked so far for LDES, because the main use 
case so far were “base registries”. This means it has one authoritative 
source that publishes the truth incrementally, and we thus don’t need 
real conflict resolution. The event stream was this way also never 
“written” to: it was a view on top of an authoritative source, such as a 
database at a governmental institution.

In our research team at Ghent University (https://knows.idlab.ugent.be/) 
we’re however also involved into the Solid project in which the use case 
is different: multiple agents can write something to the server. Instead 
of writing to symmetric resources, I believe most writes will need to 
have an effect on multiple resources and will have a flow from write to 
read. For every of those read resources on top of an inbox (which is 
very similar to an LDES) of writes, we will need to do conflict 
resolution and I think this should probably first be defined in this CG.

On the one hand, TREE is now looking for more participants to further 
evolve the spec, so if you’re enthused about the spec, I’d be very happy 
to have some more helping hands in the CG: 
https://github.com/TREEcg/specification?tab=readme-ov-file#the-w3c-community-group

On the other hand, SEMIC LDES at this moment does not define what an 
event should look like. The group however will start meeting again 
starting March/April 2025 (preparations are currently on-going) and a 
proposal will be launched for an optional LDES native way to describe 
such updates. We will of course take a very close look to CRDT4RDF, and 
this is also the main reason why I was one of the first people to join here!

Kind regards,

Pieter

On 21/11/2024 02:39, niko@nextgraph.org wrote:
>
> on this topic, elf Pavlik reminded me that Pieter is one of the first 
> who joined this group, and that he will surely like to introduce LDES 
> before we digging more into this very interesting subject!
>
> On 21/11/24 3:30 am, Niko - NextGraph wrote:
>>
>> Hello everyone,
>>
>> In the matrix room of Solid/Specification, elf Pavlik shared recently 
>> 2 links that could be of interest to the topic of CRDT, or at least, 
>> to my understanding, about distribution and federation of RDF data.
>>
>> https://treecg.github.io/specification/ 
>> <https://treecg.github.io/specification/>
>>
>> https://tree.linkeddatafragments.org/linked-data-event-streams/ 
>> <https://tree.linkeddatafragments.org/linked-data-event-streams/>
>>
>> I'll give here below my understanding of those 2 specs.
>>
>> We have the chance to have among us here Hala Skaf-Molli who took 
>> part in the 2 research papers I mention further down.
>>
>> This "TREE" spec is amazing, and I have been looking for something 
>> like that for many years!
>>
>> This spec, to my understanding, is about sharding and distribution of 
>> data in a complex network of data repositories, with a capability to 
>> search datasets with some parameters. It is very useful when data is 
>> distributed. The Linked data event streams spec (LDES) which is from 
>> the same people and relates to the TREE spec, supports an append-only 
>> collection of immutable records. We can see in the examples that they 
>> use the concept of `versions` of the records that supersedes each 
>> other in the stream, if needed.
>> Also if the stream definition itself (the shape by example) needs to 
>> change, they have a note saying in the specs that the new shape 
>> should be backward compatible, or that a fork is needed.
>>
>> Nowhere in those 2 specs is the concept of "merging conflicts" 
>> present. They elude the question of conflict, and I suppose, based 
>> their conflict resolution on the timestamps that I see everywhere in 
>> the given examples. Which makes it a LWW (last write wins)... which 
>> is the poorest guarantee you can get, and does not really qualify, in 
>> my opinion, for a CRDT.
>> But the spec is really interesting about sharding and distribution of 
>> data.
>>
>> In fact it could complement the work done by Pascal and Hala Molli et 
>> al., on the problem of source selection and federated queries, that 
>> they addressed recently with DeKaloG 
>> https://hal.science/hal-03936036/document and FedUP 
>> https://hal.science/hal-04538238/document
>>
>> Those topics are of high importance when we want to consider 
>> scalability and global search in a decentralized system.
>>
>> CRDT is about automatic conflict resolution, which is a related topic 
>> to federation and distribution, but is essentially different too, as 
>> it concerns updates and their consistency, while what we see here is 
>> more concerned about read, search and discoverability patterns.
>>
-- 
https://pietercolpaert.be
+32486747122
Received on Thursday, 21 November 2024 08:55:22 UTC