- From: Kjetil Kjernsmo <kjetil@kjernsmo.net>
- Date: Tue, 14 Oct 2014 21:49:12 +0200
- To: public-hydra@w3.org
On Tuesday 14. October 2014 17.18.57 Ruben Verborgh wrote: > Reasonably accurate sounds fine indeed. Actually, I'm -1 to that :-) > But that of course depends on how "reasonably accurate" hydra:totalItems > is defined. As far as it is implemented now, the best possible estimate > is used for void:triples, and I don't see any reason to do otherwise. > Is that best possible estimate good enough for hydra:totalItems? I'm thinking in terms of statistics, and I also note that we do not have any way to express uncertainty. Best possible is a very inaccurate term. :-) You could envision a system with a sampling algorithm, and then, you set your sample size based on the confidence level. If you want a high confidence level, then the cost is higher, because you need a larger sample. And "best possible" means you are quite free to choose a confidence level based on the cost you, as the server owner, is prepared to pay for estimating the number of triples. The confidence level needs to be pretty high, but as long as it is not specified, I think you'd be fine choosing something good enough, as you can say "well, it is the best I found I could defend paying for". Exact, OTOH, means you have to count them all. Period. It may be outdated the next second, true, but you have to count them. IMHO. It isn't the changing server state that should be the distinction, it is whether you are allowed to use sampling to derive the triple count. Then, the influence on the cost model and thus query execution may be quite substantial. Actually, I think we need to start working on cost models where uncertainty plays a role. Anybody want to join me in such an effort? Cheers, Kjetil
Received on Tuesday, 14 October 2014 19:49:48 UTC