- From: Amirouche Boubekki <amirouche.boubekki@gmail.com>
- Date: Sat, 2 May 2020 10:34:36 +0200
- To: Diego Torres <diego.torres@lifia.info.unlp.edu.ar>
- Cc: Semantic Web <semantic-web@w3.org>
Hello Diego! Le ven. 1 mai 2020 à 18:38, Diego Torres <diego.torres@lifia.info.unlp.edu.ar> a écrit : > > Dear all, > > I´ve notice in the last time the appearance of knowledge graph as a keyword in the Semantic Web context. As I am from the old school, some programming technologies that I am managing are a little out of date. > > I would like to ask if some of you could recommend a basic toolkit for a knowledge graph phd student. This includes from programming languages to frameworks, tools, and any other important tool. For example, where could use to store or manage a knowledge graph (neo4j was the last I ve seen, imagine there are new alternativas, most of them open source). > > Thanks in advance, > > Diego Given such a general question, I take this as an opportunity to try to explain that neo4j or so called RDF datastore are very difficult to scale if not a dead end in a realistic scenario. Let me explain: Things like neo4j, are really good for things that are relational, against which you need to do queries that are deeply recursive. To get started neo4j is difficult to scale in terms of data size. Second, not everything is relational. That is when you need to do things like full-text search, or typo correction or keyword suggestion you need to fallback to another database which puts a strain both in production and in developer setup. Which in turn makes it difficult to reproduce the setup both in production and locally. And when you do no need to scale beyond on single box, it is still very difficult. A little tip: if it takes more than one day to setup the whole system with a green horn on a developer machine or worse you need to buy more cloud credits to setup the dev environment => that is NOT a good system, that is not future proof, that is not good. On top of that, the time-to-learn is gigantic for a newbie, not only there is the whole setup that works like a castle of cards but there is the time required to learn the surface aka. the Domain Specific Languages that you need to know just to be able to use them (ElasticSearch JSON mess, REDIS LUA, SQL, Cyper or GQL). The problem is the same with RDF stores, they do not scale in terms of features and use-cases, for the same reasons. This leads to an unbound microservices mess. My recommendation is to learn about FoundationDB which can scale down to a single box and works in the large just as easily. You will need to LEARN something new, BUT like LISP it is a programmable programming system, you can do a lot more with an OKVS than with any other database system. And please try something else than Python, because the Global Interpreter Lock is here to stay, and if you want to make your work reproducible and accessible, multiprocessing (or worse: microservices!) does not cut it. I know it is a lot of work, but I am confident that Python (with the GIL) is holding back progress in Science. NB: I did not say "RDF is useless". > > Dr. Diego Torres > Centro de Investigación LIFIA > Facultad de Informática - UNLP > diego.torres[at]lifia.info.unlp.edu.ar > http://www.lifia.info.unlp.edu.ar/lifia/en/files/diego-torres/ This is a redirection. > Director de http://cientopolis.org This link does not work (404) -- Amirouche ~ https://hyper.dev
Received on Saturday, 2 May 2020 08:35:01 UTC