Re: Size a linked open data set

Many thanks John for the elegant solution.

My perception is that
select count(distinct ?r) where { ?r ?p ?l }
is semantically equivalent to
select (count(?s) as ?c) where { select distinct ?s where { ?s ?p []} }
It gives the count of distinct nodes in the graph, so the difference is
only a result of the internal implementation. So, it seems necessary to
know a lot about implementation to know how to get the result.
Am I wrong?



--
Jean-Claude Moissinac


2016-07-06 15:55 GMT+02:00 John Walker <john.walker@semaku.com>:

> How about reformulating as:
>
> select (count(?s) as ?c) where { select distinct ?s where { ?s ?p []} }
>
> Which gives a result of 10515620 resources [1].
>
> Regards,
> John
>
> [1]
> http://fr.dbpedia.org/sparql?default-graph-uri=&query=select+%28count%28%3Fs%29+as+%3Fc%29+where+%7B+select+distinct+%3Fs+where+%7B+%3Fs+%3Fp+%5B%5D%7D+%7D&format=text%2Fhtml&timeout=0&debug=on
>
>
> -----Original Message-----
> From: Hugh Williams [mailto:hwilliams@openlinksw.com]
> Sent: Wednesday, July 06, 2016 3:15 PM
> To: Jean-Claude Moissinac <jean-claude.moissinac@telecom-paristech.fr>
> Cc: public-lod <public-lod@w3.org>
> Subject: Re: Size a linked open data set
>
> Hi Jean-Claude,
>
> The "select count(distinct ?r) where { ?r ?p ?l }” query is expensive in
> terms of database resources and would result in a huge hash table being
> creating to try and service it which is causing it to timeout based on the
> settings on the instance by whoever maintains it.
>
> On http://dbpedia.org/sparql the original canonical English DBpedia
> endpoint OpenLink Software hosts, we provide preloaded VOID datasets, such
> that they don’t have to be queried each time, see
> http://dbpedia.org/void/Dataset , but the French DBpedia instance does
> not appear to have this ie http://fr.dbpedia.org/void/Dataset
>
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software, Inc.      //              http://www.openlinksw.com/
> Weblog   -- http://www.openlinksw.com/blogs/
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter  -- http://twitter.com/OpenLink
> Google+  -- http://plus.google.com/100570109519069333827/
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
>
> > On 6 Jul 2016, at 12:49, Jean-Claude Moissinac <
> jean-claude.moissinac@telecom-paristech.fr> wrote:
> >
> > Hello
> >
> > In my work, I need to know the number of distinct resources in a dataset.
> > For example, with dbpedia-fr, I'm trying
> > select count(distinct ?r) where { ?r ?p ?l }
> >
> > And I'm always getting a timeout error message
> > While with
> > select count(?r) where { ?r ?p ?l }
> > I'm getting
> > 185404575
> >
> > Is it a good way to know about such size?
> >
> > --
> > Jean-Claude Moissinac
> >
>
>

Received on Wednesday, 13 July 2016 16:06:00 UTC