RE: totalItems vs void:triples

On 14 Okt 2014 at 21:49, Kjetil Kjernsmo wrote:
> On Tuesday 14. October 2014 17.18.57 Ruben Verborgh wrote:
>> Reasonably accurate sounds fine indeed.
> 
> Actually, I'm -1 to that :-)

Why is that?


>> But that of course depends on how "reasonably accurate" hydra:totalItems
>> is defined. As far as it is implemented now, the best possible estimate
>> is used for void:triples, and I don't see any reason to do otherwise.
>> Is that best possible estimate good enough for hydra:totalItems?
> 
> I'm thinking in terms of statistics, and I also note that we do not have
any
> way to express uncertainty.
> 
> Best possible is a very inaccurate term. :-) You could envision a system

Right


> with a sampling algorithm, and then, you set your sample size based on the
> confidence level. If you want a high confidence level, then the cost is
> higher, because you need a larger sample. And "best possible" means you
are
> quite free to choose a confidence level based on the cost you, as the
server
> owner, is prepared to pay for estimating the number of triples. The
> confidence level needs to be pretty high, but as long as it is not
specified,
> I think you'd be fine choosing something good enough, as you can say
"well,
> it is the best I found I could defend paying for".
> 
> Exact, OTOH, means you have to count them all. Period. It may be outdated
> the next second, true, but you have to count them. IMHO. It isn't the
> changing server state that should be the distinction, it is whether you
are
> allowed to use sampling to derive the triple count.

What are the practical consequences of this? I think it boils down to the
questions of what people will use hydra:totalItems for. Do you have an
application that requires hydra:totalItems to be 100% accurate?



--
Markus Lanthaler
@markuslanthaler

Received on Monday, 27 October 2014 14:13:34 UTC