Re: Size a linked open data set

Dear Jean-Claude,

I'm not sure exactly what you meant by the "number of distinct resources in
a dataset". Is it "the total number of distinct subjects" including both
IRIs and blank nodes? It seems your first query counts that. Your second
query seems to count the number of triples in the dataset. You can also
count total number of distinct resources or IRIs taking into account
subject, predicate, objects of all triples. The VoID vocabulary defines
some of those statistics. https://www.w3.org/TR/void/#statistics

Loupe, a tool that we built to explore datasets, provide some of those
statistics for the DBpedia (FR) 2015-04 dataset.
http://loupe.linkeddata.es/loupe/summary.jsp?dataset=frdb

At the moment, we are creating a new version with DBpedia 2015-10 datasets
and we will be happy to share those statistics with you in advance. Please
feel free to contact us if you don't find the information you need in the
current online version.

Best Regards,
Nandana

Ontology Engineering Group (OEG)
Universidad Politécnica de Madrid
Madrid, Spain

On Wed, Jul 6, 2016 at 1:49 PM, Jean-Claude Moissinac <
jean-claude.moissinac@telecom-paristech.fr> wrote:

> Hello
>
> In my work, I need to know the number of distinct resources in a dataset.
> For example, with dbpedia-fr, I'm trying
> select count(distinct ?r) where { ?r ?p ?l }
>
> And I'm always getting a timeout error message
> While with
> select count(?r) where { ?r ?p ?l }
> I'm getting
> 185404575
>
> Is it a good way to know about such size?
>
> --
> Jean-Claude Moissinac
>
>

Received on Wednesday, 6 July 2016 12:49:14 UTC