Re: Size a linked open data set

Hi

You may be interested in the rich dataset statistics that are reported as part of the Health Care and Life Sciences Community Profile for dataset descriptions; these extend the properties given in the VoID vocabulary.
https://www.w3.org/TR/hcls-dataset/#s6_6

The linked section gives a description of the statistic reported and the SPARQL query that is used to generate the values.

Best regards,

Alasdair

On 13 Jul 2016, at 17:05, Jean-Claude Moissinac <jean-claude.moissinac@telecom-paristech.fr<mailto:jean-claude.moissinac@telecom-paristech.fr>> wrote:

Many thanks John for the elegant solution.

My perception is that
select count(distinct ?r) where { ?r ?p ?l }
is semantically equivalent to
select (count(?s) as ?c) where { select distinct ?s where { ?s ?p []} }
It gives the count of distinct nodes in the graph, so the difference is only a result of the internal implementation. So, it seems necessary to know a lot about implementation to know how to get the result.
Am I wrong?



--
Jean-Claude Moissinac


2016-07-06 15:55 GMT+02:00 John Walker <john.walker@semaku.com<mailto:john.walker@semaku.com>>:
How about reformulating as:

select (count(?s) as ?c) where { select distinct ?s where { ?s ?p []} }

Which gives a result of 10515620 resources [1].

Regards,
John

[1] http://fr.dbpedia.org/sparql?default-graph-uri=&query=select+%28count%28%3Fs%29+as+%3Fc%29+where+%7B+select+distinct+%3Fs+where+%7B+%3Fs+%3Fp+%5B%5D%7D+%7D&format=text%2Fhtml&timeout=0&debug=on



-----Original Message-----
From: Hugh Williams [mailto:hwilliams@openlinksw.com<mailto:hwilliams@openlinksw.com>]
Sent: Wednesday, July 06, 2016 3:15 PM
To: Jean-Claude Moissinac <jean-claude.moissinac@telecom-paristech.fr<mailto:jean-claude.moissinac@telecom-paristech.fr>>
Cc: public-lod <public-lod@w3.org<mailto:public-lod@w3.org>>
Subject: Re: Size a linked open data set

Hi Jean-Claude,

The "select count(distinct ?r) where { ?r ?p ?l }” query is expensive in terms of database resources and would result in a huge hash table being creating to try and service it which is causing it to timeout based on the settings on the instance by whoever maintains it.

On http://dbpedia.org/sparql the original canonical English DBpedia endpoint OpenLink Software hosts, we provide preloaded VOID datasets, such that they don’t have to be queried each time, see http://dbpedia.org/void/Dataset , but the French DBpedia instance does not appear to have this ie http://fr.dbpedia.org/void/Dataset


Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.      //              http://www.openlinksw.com/

Weblog   -- http://www.openlinksw.com/blogs/

LinkedIn -- http://www.linkedin.com/company/openlink-software/

Twitter  -- http://twitter.com/OpenLink

Google+  -- http://plus.google.com/100570109519069333827/

Facebook -- http://www.facebook.com/OpenLinkSoftware

Universal Data Access, Integration, and Management Technology Providers

> On 6 Jul 2016, at 12:49, Jean-Claude Moissinac <jean-claude.moissinac@telecom-paristech.fr<mailto:jean-claude.moissinac@telecom-paristech.fr>> wrote:
>
> Hello
>
> In my work, I need to know the number of distinct resources in a dataset.
> For example, with dbpedia-fr, I'm trying
> select count(distinct ?r) where { ?r ?p ?l }
>
> And I'm always getting a timeout error message
> While with
> select count(?r) where { ?r ?p ?l }
> I'm getting
> 185404575
>
> Is it a good way to know about such size?
>
> --
> Jean-Claude Moissinac
>



Alasdair J G Gray
Fellow of the Higher Education Academy
Assistant Professor in Computer Science,
School of Mathematical and Computer Sciences
(Athena SWAN Bronze Award)
Heriot-Watt University, Edinburgh UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33

ORCID: http://orcid.org/0000-0002-5711-4872

Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair










________________________________

Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences.

The contents of this e-mail (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.

Received on Tuesday, 19 July 2016 12:40:50 UTC