Re: Knowledge graph toolkit from Adrian Gschwend on 2020-05-03 (semantic-web@w3.org from May 2020)

From: Adrian Gschwend <ml-ktk@netlabs.org>
Date: Sun, 3 May 2020 11:51:16 +0200
Cc: semantic-web <semantic-web@w3.org>
Message-ID: <3ed56d07-22cd-63a0-5e16-bdec530a7090@netlabs.org>
On 03.05.20 08:43, Amirouche Boubekki wrote:

[...]
> offer. What is required is indeed a relational database like RDF
> describes. But more than that, a modern AI system has to tackle
> heterogeneous data types that do not blend nicely into the RDF
> framework. I forgot to mention geometric data. I forgot to mention
> strong ACID guarantees.
I would say there is no other data model out there which can unify
heterogeneous data types better than RDF. What does in your opinion "not
blend nicely into the RDF framework"?

> It has to do with RDF with the fact that people spread the idea that
> RDF framework is a go to solution to do semantic work. Except, it does
> not provide a solution for:
> 
> - full text search

nonsense, there is no standard API but pretty much every triplestore I
know provides that, see

https://github.com/w3c/sparql-12/issues/40

Just because it's not in current SPARQL spec does not mean it's not
there at all. Also we do work on SPARQL 1.2, that's the beauty of open
standards.

> - geometric search

https://www.ogc.org/standards/geosparql/

It's a not really well written spec but it's there since 2011 and
various stores implement that, for example Jena:

https://jena.apache.org/documentation/geosparql/

> - keyword suggestion (approximate string matching)

see all lucene based fulltext-search implementations above

> - historisation

There are a whole bunch of papers about versioning RDF from a research
POV, I know that at least Stardog implements that in their product.

My colleague just recently wrote a versioned RDF store for distributed
IoT devices so that's surely a solvable problem.

While I always thought I absolutely need versioning I noticed that in
reality this is far less the case, because I often model the data
versioned in RDF directly so no need to get that on store level.

> - ACID guarantees

Again, solvable. Stardog does this for example.
(https://stardog.docs.apiary.io/#reference/managing-transactions)

OSS stacks have implementations as well, also there are discussions
around transactions in the SPARQL 1.2 CWG:
https://github.com/w3c/sparql-12/issues/83

> And probably others that I forget.

you seem to have decided that RDF is not for you, this is totally fine.

But YMMV, I think RDF is *the* stack to build KGs on and I have not been
disappointed so far. If we miss something, we try to add it to the stack.

> Two things:
> 
> 1) For the record: money is not Science. Profitable does not
> necessarily mean a Good Thing.

No disagreement here but how is that related to the scaling remark?

> 2) There is not publicly available project using publicly available
> software that scale beyond 1TB.

What you want to say is "I am not aware of a publicly available project
using publicly available software that scale beyond 1TB". Also, sorry to
disappoint you:

https://de.slideshare.net/jervenbolleman/sparqluniprotorg-in-production-poster

That was 2017, Uniprot again grew since then, latest number I have in
mind is well above 50 billion triples.

For larger-scale Open Source RDF implementations you might want to consider:

https://cm-well.github.io/CM-Well/index.html

See for example the high-level architecture here:

https://cm-well.github.io/CM-Well/Introduction/Intro.CM-WellHigh-LevelArchitecture.html

If you think this is too complicated please remember that Uniprot runs
on a single machine using Virtuoso.

There are a few other large-scale stores like Apache Rya but I did not
try those yet.

> Indeed, when one asks me my advice about a _basic_ toolkit to do KG, I
> recommend FDB, because it can handle all the cases previously
> mentioned. And also I do not to forget to mention that it is a long
> journey, especially if you want to be valid in the regard of RDF
> standard.

That is a tooling question and that got a lot better the past years. But
still work to do for sure and we work on that.

> As far as I am concerned RDF offers good guiding principles, but it
> requires decades long of study (much like compiler work) to grasp
> which is a bummer. I ought to be simpler, much simpler and that is
> what I am doing in my projects: taking the best of RDF and leaving
> aside what is not necessary.

I disagree here and I talk from experience. I do a lot of RDF teaching
and once people understand the basics, they can be extremely productive
with RDF.

> exists.  But I will not forsake advancement and innovation for the
> purpose of backward compatibility with something that is so gigantic,
> especially when something easier is possible.

Again that is fine if your use-cases are limited. We leverage the power
of the RDF stack so "something easier" means "something less powerful".

regards

Adrian
Received on Sunday, 3 May 2020 09:51:37 UTC