- From: Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
- Date: Tue, 19 Jul 2016 12:40:15 +0000
- To: Jean-Claude Moissinac <jean-claude.moissinac@telecom-paristech.fr>
- CC: John Walker <john.walker@semaku.com>, Hugh Williams <hwilliams@openlinksw.com>, public-lod <public-lod@w3.org>
- Message-ID: <93364D71-DEE3-4B12-B1CE-0733E5F126AD@hw.ac.uk>
Hi You may be interested in the rich dataset statistics that are reported as part of the Health Care and Life Sciences Community Profile for dataset descriptions; these extend the properties given in the VoID vocabulary. https://www.w3.org/TR/hcls-dataset/#s6_6 The linked section gives a description of the statistic reported and the SPARQL query that is used to generate the values. Best regards, Alasdair On 13 Jul 2016, at 17:05, Jean-Claude Moissinac <jean-claude.moissinac@telecom-paristech.fr<mailto:jean-claude.moissinac@telecom-paristech.fr>> wrote: Many thanks John for the elegant solution. My perception is that select count(distinct ?r) where { ?r ?p ?l } is semantically equivalent to select (count(?s) as ?c) where { select distinct ?s where { ?s ?p []} } It gives the count of distinct nodes in the graph, so the difference is only a result of the internal implementation. So, it seems necessary to know a lot about implementation to know how to get the result. Am I wrong? -- Jean-Claude Moissinac 2016-07-06 15:55 GMT+02:00 John Walker <john.walker@semaku.com<mailto:john.walker@semaku.com>>: How about reformulating as: select (count(?s) as ?c) where { select distinct ?s where { ?s ?p []} } Which gives a result of 10515620 resources [1]. Regards, John [1] http://fr.dbpedia.org/sparql?default-graph-uri=&query=select+%28count%28%3Fs%29+as+%3Fc%29+where+%7B+select+distinct+%3Fs+where+%7B+%3Fs+%3Fp+%5B%5D%7D+%7D&format=text%2Fhtml&timeout=0&debug=on -----Original Message----- From: Hugh Williams [mailto:hwilliams@openlinksw.com<mailto:hwilliams@openlinksw.com>] Sent: Wednesday, July 06, 2016 3:15 PM To: Jean-Claude Moissinac <jean-claude.moissinac@telecom-paristech.fr<mailto:jean-claude.moissinac@telecom-paristech.fr>> Cc: public-lod <public-lod@w3.org<mailto:public-lod@w3.org>> Subject: Re: Size a linked open data set Hi Jean-Claude, The "select count(distinct ?r) where { ?r ?p ?l }” query is expensive in terms of database resources and would result in a huge hash table being creating to try and service it which is causing it to timeout based on the settings on the instance by whoever maintains it. On http://dbpedia.org/sparql the original canonical English DBpedia endpoint OpenLink Software hosts, we provide preloaded VOID datasets, such that they don’t have to be queried each time, see http://dbpedia.org/void/Dataset , but the French DBpedia instance does not appear to have this ie http://fr.dbpedia.org/void/Dataset Best Regards Hugh Williams Professional Services OpenLink Software, Inc. // http://www.openlinksw.com/ Weblog -- http://www.openlinksw.com/blogs/ LinkedIn -- http://www.linkedin.com/company/openlink-software/ Twitter -- http://twitter.com/OpenLink Google+ -- http://plus.google.com/100570109519069333827/ Facebook -- http://www.facebook.com/OpenLinkSoftware Universal Data Access, Integration, and Management Technology Providers > On 6 Jul 2016, at 12:49, Jean-Claude Moissinac <jean-claude.moissinac@telecom-paristech.fr<mailto:jean-claude.moissinac@telecom-paristech.fr>> wrote: > > Hello > > In my work, I need to know the number of distinct resources in a dataset. > For example, with dbpedia-fr, I'm trying > select count(distinct ?r) where { ?r ?p ?l } > > And I'm always getting a timeout error message > While with > select count(?r) where { ?r ?p ?l } > I'm getting > 185404575 > > Is it a good way to know about such size? > > -- > Jean-Claude Moissinac > Alasdair J G Gray Fellow of the Higher Education Academy Assistant Professor in Computer Science, School of Mathematical and Computer Sciences (Athena SWAN Bronze Award) Heriot-Watt University, Edinburgh UK. Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk> Web: http://www.macs.hw.ac.uk/~ajg33 ORCID: http://orcid.org/0000-0002-5711-4872 Office: Earl Mountbatten Building 1.39 Twitter: @gray_alasdair ________________________________ Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences. The contents of this e-mail (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.
Received on Tuesday, 19 July 2016 12:40:50 UTC