Re: Knowledge graph toolkit from Amirouche Boubekki on 2020-05-02 (semantic-web@w3.org from May 2020)

From: Amirouche Boubekki <amirouche.boubekki@gmail.com>
Date: Sat, 2 May 2020 10:34:36 +0200
To: Diego Torres <diego.torres@lifia.info.unlp.edu.ar>
Cc: Semantic Web <semantic-web@w3.org>
Message-ID: <CAL7_Mo_DBpx=+SRErqPi_SAjXGC7XJs=qd62edsddOE186thTg@mail.gmail.com>

Hello Diego!

Le ven. 1 mai 2020 à 18:38, Diego Torres
<diego.torres@lifia.info.unlp.edu.ar> a écrit :
>
> Dear all,
>
> I´ve notice in the last time the appearance of knowledge graph as a keyword in the Semantic Web context. As I am from the old school, some programming technologies that I am managing are a little out of date.
>
> I would like to ask if some of you could recommend a basic toolkit for a knowledge graph phd student. This includes from programming languages to frameworks, tools, and any other important tool. For example, where could use to store or manage a knowledge graph (neo4j was the last I ve seen, imagine there are new alternativas, most of them open source).
>
> Thanks in advance,
>
> Diego

Given such a general question, I take this as an opportunity to try to
explain that neo4j or so called RDF datastore are very difficult to
scale if not a dead end in a realistic scenario. Let me explain:

Things like neo4j, are really good for things that are relational,
against which you need to do queries that are deeply recursive. To get
started neo4j is difficult to scale in terms of data size. Second, not
everything is relational. That is when you need to do things like
full-text search, or typo correction or keyword suggestion you need to
fallback to another database which puts a strain both in production
and in developer setup. Which in turn makes it difficult to reproduce
the setup both in production and locally. And when you do no need to
scale beyond on single box, it is still very difficult. A little tip:
if it takes more than one day to setup the whole system with a green
horn on a developer machine or worse you need to buy more cloud
credits to setup the dev environment => that is NOT a good system,
that is not future proof, that is not good. On top of that, the
time-to-learn is gigantic for a newbie, not only there is the whole
setup that works like a castle of cards but there is the time required
to learn the surface aka. the Domain Specific Languages that you need
to know just to be able to use them (ElasticSearch JSON mess, REDIS
LUA, SQL, Cyper or GQL).

The problem is the same with RDF stores, they do not scale in terms of
features and use-cases, for the same reasons. This leads to an unbound
microservices mess.

My recommendation is to learn about FoundationDB which can scale down
to a single box and works in the large just as easily. You will need
to LEARN something new, BUT like LISP it is a programmable programming
system, you can do a lot more with an OKVS than with any other
database system.

And please try something else than Python, because the Global
Interpreter Lock is here to stay, and if you want to make your work
reproducible and accessible, multiprocessing (or worse:
microservices!) does not cut it. I know it is a lot of work, but I am
confident that Python (with the GIL) is holding back progress in
Science.

NB: I did not say "RDF is useless".

>
> Dr. Diego Torres
> Centro de Investigación LIFIA
> Facultad de Informática - UNLP
> diego.torres[at]lifia.info.unlp.edu.ar
> http://www.lifia.info.unlp.edu.ar/lifia/en/files/diego-torres/

This is a redirection.

> Director de http://cientopolis.org

This link does not work (404)


-- 
Amirouche ~ https://hyper.dev

Received on Saturday, 2 May 2020 08:35:01 UTC