- From: Kjetil Kjernsmo <kjetil@kjernsmo.net>
- Date: Mon, 24 Nov 2014 09:54:25 +0100
- To: public-hydra@w3.org
- Message-ID: <3077414.jpBD7z2ARa@owl>
On Sunday 23. November 2014 10.56.24 Ruben Verborgh wrote: > It really depends on the update frequency of the datasets. > Some of the most referenced datasets in the SemWeb are static, > like the various DBpedia versions we all know very well, > and those never change once created (not talking about Live). > Hence, we are able to host them through the HDT compressed triple > format, which gives excellent performance for those cases, > far better than what I've seen any DBMS do. Well, while you are right in that you post some impressive figures, I'm not so sure that's not mainly an artifact of the benchmarks. :-) You'd expect HDT to be fast for unbound subjects. Fast backends are nice, but SPARQLES now sports 80 ms response times for their SPARQL queries, against my pure Perl endpoint: http://sparqles.okfn.org/endpoint?uri=http%3A%2F%2Fdata.lenka.no%2Fsparql but that's obviously due to the query not going all the way down to the SPARQL endpoint, but is served from DDR2 main memory from my Celeron-powered Varnish cache. :-) I suspect you would need materialization to acheive similar numbers, which divides the problem space into two parts: The queries that you know people will do, and the ones that you don't know. Update frequencies, if we talk several seconds, are not all that important, if you have thousands of identical queries within 10 seconds, caching is still of paramount importance. So, the question is how you do materalization for the queries that you do know people will run. In the vast majority of cases, the way you solve this is a having a list of URLs on a host outside of you cache, and you signal wget to visit this list when you update. Perhaps you need something slightly more sophisticated if you update just parts of the dataset, but this is two one-liners, it couldn't be done any simpler. In this case, the backend performance isn't all that important as the above example shows, so, you might claim that you can do it faster than any DBMS, but that'd be premature optimization, IMHO. To respond to arbitrary queries that is hard to predict, it is a different story, but at that point, I'm not quite likely to concede that the DBMS isn't nice to have just yet, also for performance reasons. :-) Cheers, Kjetil
Received on Monday, 24 November 2014 08:55:43 UTC