W3C home > Mailing lists > Public > semantic-web@w3.org > February 2020

Re: copernic.space

From: Amirouche Boubekki <amirouche.boubekki@gmail.com>
Date: Sun, 23 Feb 2020 21:25:45 +0100
Message-ID: <CAL7_Mo9S9EUZXuaKKdJn8sRduuMmonO42VKszcm_7UVacBoPdw@mail.gmail.com>
To: semantic-web <semantic-web@w3.org>
That is short on explanation.

Le jeu. 20 févr. 2020 à 21:24, Amirouche Boubekki
<amirouche.boubekki@gmail.com> a écrit :
>
> I ported the code of the versioned triple store from Scheme LISP to
> Python Django. It rely on FoundationDB to store triples.
>
> Check out: https://github.com/amirouche/copernic

This project designed to allow cooperation in the large, thanks to the support
of a change-request mechanic similar to github pull-request or gitlab
merge-request,
on any structured data.  That includes relational data, graph-like
data and tabular
data. That is not exactly git-for-structured-data (unlike previous iteration).

It is meant to scale both in terms of data size and number of contributions.

Like wikipedia or wikidata, there is a single version of truth. In other words,
every user gets the same data. There is no per-user data repositories.
I dropped
the git-like Directed-Acyclic-Graph history branches because it would
be less scalable.

The drawback of the current approach is that it does not allow to
"fork" a branch,
and to have different versions of the same database that go in
different directions
e.g. elaborate some theories in different branches and then merge only the good
one. Except some bug in the code, it is possible to query a given
stash of change,
but the stash only represent a single commit in git parlance: it has no history.

In other words, change-request work in a way that is similar to git stash.
That is the diff, additions and deletions of triples, is stored in 5-tuple store
like the rest of the data. Until the change is applied by a super user. The
data part of the change-request is invisible outside the given change-request,
like a stash in git.

5 tuple store was said to be overkill, but I do not know what are the queries
I want to execute, as of today, against the history, so I index-all-the-things,
what blazegraph code calls "perfect indices". The five tuple is
described in [0], will
eventually contains the original added or remove triple, plus a boolean denoting
whether it is an addition or a deletion (it is a called tombstone in
postgresql mvcc),
and the changeid that is the an unique identifier for the group of
addition and deletion,
similar to a git commit hash.

[0] https://github.com/amirouche/copernic/blob/master/copernic/vnstore.py#L59

Once a super user applies the change, that is merely swapping a single `None`
value with a timestamp, the history is properly serialized realizing a single
branch history. See:

  https://github.com/amirouche/copernic/blob/5397fb7d1a4f28a2ba7ea8aac84d322d8d79c148/copernic/vnstore.py#L84-L87

In fact, you can have a change request that is bigger than available memory.
Unlike previous iteration, changes (or commits) do not map one-to-one with
database transactions. Hence, there might be integrity bugs.

It it possible to do time traveling queries, like freebase did. It is
not exposed
in the web user interface.

There is always an up-to-date image of the latest data, to speed up queries.
But the history only store the differences between successive versions.

The main difference I see with existing RDF databases are:

- it is versioned in a single branch history, with stashes of changes.
- it scales horizontally, thanks to foundationdb.
- it does not support SPARQL, as of yet.
- it does not support reasoning of any sort.

The only way to add or remove triples as of today, is to via the user interface
via "make change-request" button. In particular, the import link leads to a page
without explanation, with a box to input a file that expects JSON
lines.  That is,
things like:

["w3c", "is-a", "Web standards organization"]
["truth", "is", 42]

Where `subject`, `predicate` and `object` can be any json simple data type.
Also, I call the columns respectively `uid`, `key`, and `value`.  It
is easier to
my mind.

Here is an example query done via the web user interface:

  http://copernic.space/query/?uid0=0uid%3F&key0=title&value0=copernic&uid1=0uid%3F&key1=key%3F&value1=value%3F

The code will try to guess the type of the object: variable, uuid,
number, boolean and fallback to string.

The code is at: https://github.com/amirouche/copernic

A demo is available at: http://copernic.space/

The license is AGPLv3+

Enjoy!
Received on Sunday, 23 February 2020 20:26:10 UTC

This archive was generated by hypermail 2.4.0 : Sunday, 23 February 2020 20:26:10 UTC