Re: TPF and DBMSes (was Re: Hydra and Shapes) from Peter Vandenabeele on 2014-11-23 (public-hydra@w3.org from November 2014)

From: Peter Vandenabeele <peter@vandenabeele.com>
Date: Sun, 23 Nov 2014 13:56:57 +0100
To: Kjetil Kjernsmo <kjetil@kjernsmo.net>
Cc: "<public-hydra@w3.org>" <public-hydra@w3.org>, Kingsley Idehen <kidehen@openlinksw.com>
Message-ID: <CAC969dLi7a2z0iv6x0Z+jLtGhf-Dzf2kJdSfJSxxmNVf2oFvrA@mail.gmail.com>

On Sat, Nov 22, 2014 at 9:46 PM, Kjetil Kjernsmo <kjetil@kjernsmo.net>
wrote:
...


>
> But I think that misses the crucial point, which is how things happen
> behind that. Strictly, Ruben is right; you can basically materialize all
> possible triple patterns, with pages, and store them in a file system. In
> that case, it is correct that no DBMS is involved.
>
>
>
> However, I would claim that this is not practical in almost all cases. I
> have myself been working on a system that did something similar, in a
> fairly high-traffic, but slow-update case, which is the case where it
> possibly makes most sense, but the horror of the code we had to manage the
> updates, I still shiver at the thought! :-) Essentially, it is writing YA
> materialization framework.
>
>
>
> I think you'd much rather like to ensure that you have a good DBMS at the
> bottom, then a layer that serves TPFs, and makes sure you have cache
> headers, and then a reverse caching proxy facing the network (which is
> about the architecture that I'm running on my endpoint).
>
>
>
> However, this places three requirements that aren't necessarily easy on
> the DBMS. The first, that I know that you do well, already, Kingsley, is
> paging. The next is that it must be much cheaper to compute the cardinality
> (or at least an estimate) of any triple pattern result, than computing the
> result of the same triple pattern. Can you do that with Virtuoso? If yes,
> how do we access it?
>
>
>
> The third requirement is that it must be much cheaper to compute the last
> modified time of any triple pattern result than to compute the result
> itself, so that the reverse proxy can do its job effectively. Again, can
> Virtuoso do that, and if so, how can we access that time?
>


Maybe this problem (of a distributed data set and knowing what changed
recently
at the other side to be able to maintain your local materialized view)
could be addressed
properly with the "Kappa" architecture (http://youtu.be/fU9hR3kiOK0?t=22m01s
).

This will allow distributed caching (e.g. with Kafka + Samza) and knowledge
of:
* when the most recent update was made to e.g. a TPF or other "subset" (LDF)
that is relevant for your use case
* which new data you need to update your local copy of that subset that you
need.

HTH,

Peter

-- 
Peter Vandenabeele
http://www.allthingsdata.io
http://www.linkedin.com/in/petervandenabeele
https://twitter.com/peter_v
gsm: +32-478-27.40.69
e-mail: peter@vandenabeele.com
skype: peter_v_be

Received on Sunday, 23 November 2014 12:57:24 UTC