- From: Markus Lanthaler <markus.lanthaler@gmx.net>
- Date: Mon, 27 Oct 2014 15:12:56 +0100
- To: <public-hydra@w3.org>
- Cc: "'Kjetil Kjernsmo'" <kjetil@kjernsmo.net>
On 14 Okt 2014 at 21:49, Kjetil Kjernsmo wrote: > On Tuesday 14. October 2014 17.18.57 Ruben Verborgh wrote: >> Reasonably accurate sounds fine indeed. > > Actually, I'm -1 to that :-) Why is that? >> But that of course depends on how "reasonably accurate" hydra:totalItems >> is defined. As far as it is implemented now, the best possible estimate >> is used for void:triples, and I don't see any reason to do otherwise. >> Is that best possible estimate good enough for hydra:totalItems? > > I'm thinking in terms of statistics, and I also note that we do not have any > way to express uncertainty. > > Best possible is a very inaccurate term. :-) You could envision a system Right > with a sampling algorithm, and then, you set your sample size based on the > confidence level. If you want a high confidence level, then the cost is > higher, because you need a larger sample. And "best possible" means you are > quite free to choose a confidence level based on the cost you, as the server > owner, is prepared to pay for estimating the number of triples. The > confidence level needs to be pretty high, but as long as it is not specified, > I think you'd be fine choosing something good enough, as you can say "well, > it is the best I found I could defend paying for". > > Exact, OTOH, means you have to count them all. Period. It may be outdated > the next second, true, but you have to count them. IMHO. It isn't the > changing server state that should be the distinction, it is whether you are > allowed to use sampling to derive the triple count. What are the practical consequences of this? I think it boils down to the questions of what people will use hydra:totalItems for. Do you have an application that requires hydra:totalItems to be 100% accurate? -- Markus Lanthaler @markuslanthaler
Received on Monday, 27 October 2014 14:13:34 UTC