RE: Benchmarking with Named Graphs from Orri Erling on 2011-11-30 (public-lod@w3.org from November 2011)

From: Orri Erling <erling@xs4all.nl>
Date: Wed, 30 Nov 2011 18:26:12 +0100
To: "'Marcus Cobden'" <lists@marcuscobden.co.uk>, <public-lod@w3.org>
Message-ID: <015c01ccaf85$28b7b120$7a271360$@xs4all.nl>

Hi

The Berlin SPARQL Benchmarkk (BSBM)generator, I think,   can make many
graphs, split by type of entity.  
All the billion triple challenge data sets consist of a ton of graphs with
10-1000 triples per graph.

So to benchmark with many graphs the billion triples sets are best, they
also contain every aberration and abuse of diverse vocabularies and syntax,
which is good for their intended purpose.

Aside the case where graph marks provenance, there are not very many use
cases with a lot of graphs.  For web crawls  where one makes a graph per
page this is different.  For these cases, the more selective key is still
the s or the o and not the g.  So for query optimization the large number of
graphs does not make a big difference.  Having a lot of different values for
g will cause quads to take more space since g no longer will compress away,
aside this little difference is expected.



Orri


-----Original Message-----
From: Marcus Cobden [mailto:lists@marcuscobden.co.uk] 
Sent: Wednesday, November 30, 2011 2:34 PM
To: public-lod@w3.org
Subject: Benchmarking with Named Graphs

Does anyone know of any multi-graph benchmarking datasets?

So rather a dataset being just one big bag of triples to test with, they're
split into multiple named graphs.

Thanks,
Marcus Cobden

Received on Wednesday, 30 November 2011 17:27:23 UTC