- From: Peter Vandenabeele <peter@vandenabeele.com>
- Date: Sun, 23 Nov 2014 13:56:57 +0100
- To: Kjetil Kjernsmo <kjetil@kjernsmo.net>
- Cc: "<public-hydra@w3.org>" <public-hydra@w3.org>, Kingsley Idehen <kidehen@openlinksw.com>
- Message-ID: <CAC969dLi7a2z0iv6x0Z+jLtGhf-Dzf2kJdSfJSxxmNVf2oFvrA@mail.gmail.com>
On Sat, Nov 22, 2014 at 9:46 PM, Kjetil Kjernsmo <kjetil@kjernsmo.net> wrote: ... > > But I think that misses the crucial point, which is how things happen > behind that. Strictly, Ruben is right; you can basically materialize all > possible triple patterns, with pages, and store them in a file system. In > that case, it is correct that no DBMS is involved. > > > > However, I would claim that this is not practical in almost all cases. I > have myself been working on a system that did something similar, in a > fairly high-traffic, but slow-update case, which is the case where it > possibly makes most sense, but the horror of the code we had to manage the > updates, I still shiver at the thought! :-) Essentially, it is writing YA > materialization framework. > > > > I think you'd much rather like to ensure that you have a good DBMS at the > bottom, then a layer that serves TPFs, and makes sure you have cache > headers, and then a reverse caching proxy facing the network (which is > about the architecture that I'm running on my endpoint). > > > > However, this places three requirements that aren't necessarily easy on > the DBMS. The first, that I know that you do well, already, Kingsley, is > paging. The next is that it must be much cheaper to compute the cardinality > (or at least an estimate) of any triple pattern result, than computing the > result of the same triple pattern. Can you do that with Virtuoso? If yes, > how do we access it? > > > > The third requirement is that it must be much cheaper to compute the last > modified time of any triple pattern result than to compute the result > itself, so that the reverse proxy can do its job effectively. Again, can > Virtuoso do that, and if so, how can we access that time? > Maybe this problem (of a distributed data set and knowing what changed recently at the other side to be able to maintain your local materialized view) could be addressed properly with the "Kappa" architecture (http://youtu.be/fU9hR3kiOK0?t=22m01s ). This will allow distributed caching (e.g. with Kafka + Samza) and knowledge of: * when the most recent update was made to e.g. a TPF or other "subset" (LDF) that is relevant for your use case * which new data you need to update your local copy of that subset that you need. HTH, Peter -- Peter Vandenabeele http://www.allthingsdata.io http://www.linkedin.com/in/petervandenabeele https://twitter.com/peter_v gsm: +32-478-27.40.69 e-mail: peter@vandenabeele.com skype: peter_v_be
Received on Sunday, 23 November 2014 12:57:24 UTC