- From: Amirouche Boubekki <amirouche.boubekki@gmail.com>
- Date: Sun, 23 Feb 2020 21:25:45 +0100
- To: semantic-web <semantic-web@w3.org>
That is short on explanation. Le jeu. 20 févr. 2020 à 21:24, Amirouche Boubekki <amirouche.boubekki@gmail.com> a écrit : > > I ported the code of the versioned triple store from Scheme LISP to > Python Django. It rely on FoundationDB to store triples. > > Check out: https://github.com/amirouche/copernic This project designed to allow cooperation in the large, thanks to the support of a change-request mechanic similar to github pull-request or gitlab merge-request, on any structured data. That includes relational data, graph-like data and tabular data. That is not exactly git-for-structured-data (unlike previous iteration). It is meant to scale both in terms of data size and number of contributions. Like wikipedia or wikidata, there is a single version of truth. In other words, every user gets the same data. There is no per-user data repositories. I dropped the git-like Directed-Acyclic-Graph history branches because it would be less scalable. The drawback of the current approach is that it does not allow to "fork" a branch, and to have different versions of the same database that go in different directions e.g. elaborate some theories in different branches and then merge only the good one. Except some bug in the code, it is possible to query a given stash of change, but the stash only represent a single commit in git parlance: it has no history. In other words, change-request work in a way that is similar to git stash. That is the diff, additions and deletions of triples, is stored in 5-tuple store like the rest of the data. Until the change is applied by a super user. The data part of the change-request is invisible outside the given change-request, like a stash in git. 5 tuple store was said to be overkill, but I do not know what are the queries I want to execute, as of today, against the history, so I index-all-the-things, what blazegraph code calls "perfect indices". The five tuple is described in [0], will eventually contains the original added or remove triple, plus a boolean denoting whether it is an addition or a deletion (it is a called tombstone in postgresql mvcc), and the changeid that is the an unique identifier for the group of addition and deletion, similar to a git commit hash. [0] https://github.com/amirouche/copernic/blob/master/copernic/vnstore.py#L59 Once a super user applies the change, that is merely swapping a single `None` value with a timestamp, the history is properly serialized realizing a single branch history. See: https://github.com/amirouche/copernic/blob/5397fb7d1a4f28a2ba7ea8aac84d322d8d79c148/copernic/vnstore.py#L84-L87 In fact, you can have a change request that is bigger than available memory. Unlike previous iteration, changes (or commits) do not map one-to-one with database transactions. Hence, there might be integrity bugs. It it possible to do time traveling queries, like freebase did. It is not exposed in the web user interface. There is always an up-to-date image of the latest data, to speed up queries. But the history only store the differences between successive versions. The main difference I see with existing RDF databases are: - it is versioned in a single branch history, with stashes of changes. - it scales horizontally, thanks to foundationdb. - it does not support SPARQL, as of yet. - it does not support reasoning of any sort. The only way to add or remove triples as of today, is to via the user interface via "make change-request" button. In particular, the import link leads to a page without explanation, with a box to input a file that expects JSON lines. That is, things like: ["w3c", "is-a", "Web standards organization"] ["truth", "is", 42] Where `subject`, `predicate` and `object` can be any json simple data type. Also, I call the columns respectively `uid`, `key`, and `value`. It is easier to my mind. Here is an example query done via the web user interface: http://copernic.space/query/?uid0=0uid%3F&key0=title&value0=copernic&uid1=0uid%3F&key1=key%3F&value1=value%3F The code will try to guess the type of the object: variable, uuid, number, boolean and fallback to string. The code is at: https://github.com/amirouche/copernic A demo is available at: http://copernic.space/ The license is AGPLv3+ Enjoy!
Received on Sunday, 23 February 2020 20:26:10 UTC