Re: Benchmarking with Named Graphs from Marcus Cobden on 2011-12-01 (public-lod@w3.org from December 2011)

From: Marcus Cobden <lists@marcuscobden.co.uk>
Date: Thu, 01 Dec 2011 07:50:19 +0000
To: public-lod@w3.org
Message-ID: <4ED731BB.5070808@marcuscobden.co.uk>

Thanks Orri, Olaf.
The information and code will be a great help :)


On 30/11/2011 22:16, Olaf Hartig wrote:
> Hey,
>
> On Wednesday 30 November 2011 18:26:12 Orri Erling wrote:
>> Hi
>>
>> The Berlin SPARQL Benchmarkk (BSBM)generator, I think,   can make many
>> graphs, split by type of entity.
>
> Correct.
>
> @Marcus: If you are interested in a data generator that splits a BSBM dataset
> in a Linked Data typical way (i.e. lots of small RDF graphs, one for each
> entity), I have developed such a thing. Find more information, including the
> source code, here:
>
> http://sourceforge.net/apps/wordpress/squin/2009/04/15/a-data-generator-for-
> bsbm-that-provides-linked-data-characteristics/
>
> Cheers,
> Olaf
>
>
>> All the billion triple challenge data sets consist of a ton of graphs with
>> 10-1000 triples per graph.
>>
>> So to benchmark with many graphs the billion triples sets are best, they
>> also contain every aberration and abuse of diverse vocabularies and syntax,
>> which is good for their intended purpose.
>>
>> Aside the case where graph marks provenance, there are not very many use
>> cases with a lot of graphs.  For web crawls  where one makes a graph per
>> page this is different.  For these cases, the more selective key is still
>> the s or the o and not the g.  So for query optimization the large number
>> of graphs does not make a big difference.  Having a lot of different
>> values for g will cause quads to take more space since g no longer will
>> compress away, aside this little difference is expected.
>>
>>
>>
>> Orri
>>
>>
>> -----Original Message-----
>> From: Marcus Cobden [mailto:lists@marcuscobden.co.uk]
>> Sent: Wednesday, November 30, 2011 2:34 PM
>> To: public-lod@w3.org
>> Subject: Benchmarking with Named Graphs
>>
>> Does anyone know of any multi-graph benchmarking datasets?
>>
>> So rather a dataset being just one big bag of triples to test with, they're
>> split into multiple named graphs.
>>
>> Thanks,
>> Marcus Cobden
>

Received on Thursday, 1 December 2011 07:50:53 UTC