Re: Knowledge graph toolkit from Amirouche Boubekki on 2020-05-04 (semantic-web@w3.org from May 2020)

From: Amirouche Boubekki <amirouche.boubekki@gmail.com>
Date: Mon, 4 May 2020 08:26:37 +0200
To: Adrian Gschwend <ml-ktk@netlabs.org>
Cc: semantic-web <semantic-web@w3.org>
Message-ID: <CAL7_Mo8MHyOZ0nPTAYzBJB=3kz6+THbMLpTt330O+NSGWTGQRg@mail.gmail.com>
Hello Adrian!

First, thanks for the replies.

Le dim. 3 mai 2020 à 11:58, Adrian Gschwend <ml-ktk@netlabs.org> a écrit :
>
> On 03.05.20 08:43, Amirouche Boubekki wrote:
>
> [...]
> > offer. What is required is indeed a relational database like RDF
> > describes. But more than that, a modern AI system has to tackle
> > heterogeneous data types that do not blend nicely into the RDF
> > framework. I forgot to mention geometric data. I forgot to mention
> > strong ACID guarantees.
> I would say there is no other data model out there which can unify
> heterogeneous data types better than RDF. What does in your opinion "not
> blend nicely into the RDF framework"?

Not sure anymore, I need to study the link your provided. I sent
several emails to the list about FTS and geometric queries without
response until now.

> > It has to do with RDF with the fact that people spread the idea that
> > RDF framework is a go to solution to do semantic work. Except, it does
> > not provide a solution for:
> >
> > - full text search
>
> nonsense, there is no standard API but pretty much every triplestore I
> know provides that, see
>
> https://github.com/w3c/sparql-12/issues/40
>

Thanks!

> Just because it's not in current SPARQL spec does not mean it's not
> there at all. Also we do work on SPARQL 1.2, that's the beauty of open
> standards.
>
> > - geometric search
>
> https://www.ogc.org/standards/geosparql/
>

Thanks!

> It's a not really well written spec but it's there since 2011 and
> various stores implement that, for example Jena:
>
> https://jena.apache.org/documentation/geosparql/
>
> > - keyword suggestion (approximate string matching)
>
> see all lucene based fulltext-search implementations above
>

Everything based on Lucene will not be part of my system because it
breaks ACID guarantees.  People have tried to adapt Lucene to FDB [0]
without grand success.

[0] https://forums.foundationdb.org/t/lucene-layer-on-foundationdb/1229

> > - historisation
>
> There are a whole bunch of papers about versioning RDF from a research
> POV,

The review I read about it is not great and so far I did not find one
that works as I expect it (except mine).

> I know that at least Stardog implements that in their product.

Similarly, I asked the question nobody responded until now. I look
into stardog in particular.

> My colleague just recently wrote a versioned RDF store for distributed
> IoT devices so that's surely a solvable problem.

FOSS or it does not exists ;-)

> While I always thought I absolutely need versioning I noticed that in
> reality this is far less the case, because I often model the data
> versioned in RDF directly so no need to get that on store level.

That is a possible approach.

>
> > - ACID guarantees
>
> Again, solvable. Stardog does this for example.
> (https://stardog.docs.apiary.io/#reference/managing-transactions)

I will look into it.

>
> OSS stacks have implementations as well, also there are discussions
> around transactions in the SPARQL 1.2 CWG:
> https://github.com/w3c/sparql-12/issues/83
>

I watch (again) the repository and go through the documentation.

> > And probably others that I forget.
>
> you seem to have decided that RDF is not for you, this is totally fine.

For the record, I came to RDF from graphdb lands, and after careful
consideration and study and subset of RDF makes sense _to me_.  Like I
wrote previously, what the W3C is doing with RDF is awesome.  I wish
it was more recognized and more easier to grasp.

>
> But YMMV, I think RDF is *the* stack to build KGs on and I have not been
> disappointed so far. If we miss something, we try to add it to the stack.
>

>
> > 2) There is not publicly available project using publicly available
> > software that scale beyond 1TB.
>
> What you want to say is "I am not aware of a publicly available project
> using publicly available software that scale beyond 1TB". Also, sorry to
> disappoint you:
>
> https://de.slideshare.net/jervenbolleman/sparqluniprotorg-in-production-poster
>
> That was 2017, Uniprot again grew since then, latest number I have in
> mind is well above 50 billion triples.
>
> For larger-scale Open Source RDF implementations you might want to consider:
>
> https://cm-well.github.io/CM-Well/index.html
>
> See for example the high-level architecture here:
>
> https://cm-well.github.io/CM-Well/Introduction/Intro.CM-WellHigh-LevelArchitecture.html
>
> If you think this is too complicated please remember that Uniprot runs
> on a single machine using Virtuoso.

As far as I know it is not the Open Source version of Virtuoso.

>
> There are a few other large-scale stores like Apache Rya but I did not
> try those yet.
>
> > Indeed, when one asks me my advice about a _basic_ toolkit to do KG, I
> > recommend FDB, because it can handle all the cases previously
> > mentioned. And also I do not to forget to mention that it is a long
> > journey, especially if you want to be valid in the regard of RDF
> > standard.
>
> That is a tooling question and that got a lot better the past years. But
> still work to do for sure and we work on that.
>
> > As far as I am concerned RDF offers good guiding principles, but it
> > requires decades long of study (much like compiler work) to grasp
> > which is a bummer. I ought to be simpler, much simpler and that is
> > what I am doing in my projects: taking the best of RDF and leaving
> > aside what is not necessary.
>
> I disagree here and I talk from experience. I do a lot of RDF teaching
> and once people understand the basics, they can be extremely productive
> with RDF.
>

That material might be lost in the Internet because anytime I look for
material I end up in the w3c standards pages.

> > exists.  But I will not forsake advancement and innovation for the
> > purpose of backward compatibility with something that is so gigantic,
> > especially when something easier is possible.
>
> so "something easier" means "something less powerful".

Not necessarily. And that is the point of Ordered Key-Value Store
(OKVS) and in particular FDB. The API surface is limited, but it
provides an abstraction that allows fractal design (like having a
triplestore in a triplestore).

Sorry, I sounded rude.  I am not affiliated to Apple or FDB or any
other OKVS vendors.  I have been looking for a on-disk persistence
solution to achieve my dreams, part of the solution is RDF on top of
an OKVS. Please do not see my post(s) as an angry person, but more
like an enthusiastic person that wants to share their joy.


>
> regards
>
> Adrian
>


--
Amirouche ~ https://hyper.dev
Received on Monday, 4 May 2020 06:27:01 UTC