- From: Olaf Hartig <hartig@informatik.hu-berlin.de>
- Date: Wed, 30 Nov 2011 23:16:52 +0100
- To: public-lod@w3.org
Hey, On Wednesday 30 November 2011 18:26:12 Orri Erling wrote: > Hi > > The Berlin SPARQL Benchmarkk (BSBM)generator, I think, can make many > graphs, split by type of entity. Correct. @Marcus: If you are interested in a data generator that splits a BSBM dataset in a Linked Data typical way (i.e. lots of small RDF graphs, one for each entity), I have developed such a thing. Find more information, including the source code, here: http://sourceforge.net/apps/wordpress/squin/2009/04/15/a-data-generator-for- bsbm-that-provides-linked-data-characteristics/ Cheers, Olaf > All the billion triple challenge data sets consist of a ton of graphs with > 10-1000 triples per graph. > > So to benchmark with many graphs the billion triples sets are best, they > also contain every aberration and abuse of diverse vocabularies and syntax, > which is good for their intended purpose. > > Aside the case where graph marks provenance, there are not very many use > cases with a lot of graphs. For web crawls where one makes a graph per > page this is different. For these cases, the more selective key is still > the s or the o and not the g. So for query optimization the large number > of graphs does not make a big difference. Having a lot of different > values for g will cause quads to take more space since g no longer will > compress away, aside this little difference is expected. > > > > Orri > > > -----Original Message----- > From: Marcus Cobden [mailto:lists@marcuscobden.co.uk] > Sent: Wednesday, November 30, 2011 2:34 PM > To: public-lod@w3.org > Subject: Benchmarking with Named Graphs > > Does anyone know of any multi-graph benchmarking datasets? > > So rather a dataset being just one big bag of triples to test with, they're > split into multiple named graphs. > > Thanks, > Marcus Cobden
Received on Wednesday, 30 November 2011 22:17:39 UTC