Re: Benchmarking with Named Graphs from Olaf Hartig on 2011-11-30 (public-lod@w3.org from November 2011)

From: Olaf Hartig <hartig@informatik.hu-berlin.de>
Date: Wed, 30 Nov 2011 23:16:52 +0100
To: public-lod@w3.org
Message-Id: <201111302316.54541.hartig@informatik.hu-berlin.de>

Hey,

On Wednesday 30 November 2011 18:26:12 Orri Erling wrote:
> Hi
> 
> The Berlin SPARQL Benchmarkk (BSBM)generator, I think,   can make many
> graphs, split by type of entity.

Correct.

@Marcus: If you are interested in a data generator that splits a BSBM dataset 
in a Linked Data typical way (i.e. lots of small RDF graphs, one for each 
entity), I have developed such a thing. Find more information, including the 
source code, here:

http://sourceforge.net/apps/wordpress/squin/2009/04/15/a-data-generator-for-
bsbm-that-provides-linked-data-characteristics/

Cheers,
Olaf


> All the billion triple challenge data sets consist of a ton of graphs with
> 10-1000 triples per graph.
> 
> So to benchmark with many graphs the billion triples sets are best, they
> also contain every aberration and abuse of diverse vocabularies and syntax,
> which is good for their intended purpose.
> 
> Aside the case where graph marks provenance, there are not very many use
> cases with a lot of graphs.  For web crawls  where one makes a graph per
> page this is different.  For these cases, the more selective key is still
> the s or the o and not the g.  So for query optimization the large number
> of graphs does not make a big difference.  Having a lot of different
> values for g will cause quads to take more space since g no longer will
> compress away, aside this little difference is expected.
> 
> 
> 
> Orri
> 
> 
> -----Original Message-----
> From: Marcus Cobden [mailto:lists@marcuscobden.co.uk]
> Sent: Wednesday, November 30, 2011 2:34 PM
> To: public-lod@w3.org
> Subject: Benchmarking with Named Graphs
> 
> Does anyone know of any multi-graph benchmarking datasets?
> 
> So rather a dataset being just one big bag of triples to test with, they're
> split into multiple named graphs.
> 
> Thanks,
> Marcus Cobden

Received on Wednesday, 30 November 2011 22:17:39 UTC