Re: SPARQL performance for ORDER BY on large datasets from Steve Harris on 2009-10-05 (public-sparql-dev@w3.org from October to December 2009)

From: Steve Harris <swh@ecs.soton.ac.uk>
Date: Mon, 5 Oct 2009 22:28:24 +0100
To: public-sparql-dev@w3.org
Message-ID: <EMEW3|6a5cba2540471339bb13d60f67a893bcl94MSa03swh|ecs.soton.ac.uk|FF0-4F72-9439>

On 5 Oct 2009, at 19:15, Richard Newman wrote:
>>> My personal opinion: the BSBM serves a limited purpose for people
>>> evaluating triple stores, but strikes me as very SQL-ey in style:  
>>> the
>>> data are the opposite of sparse, and it's not a network. Relational
>>> databases are a much, much better fit for this problem, and thus  
>>> it's
>>> not very interesting. It's a little benchmarking how well an Excel
>>> spreadsheet can do pixel animation: sure, you can do it, but there  
>>> are
>>> other tools which are both mature and more suitable, so why bother?
>>
>> Wasn't the original point of BSBM to compare RDF stores with RDF-to- 
>> RDB and native SQL for a common application?  If so, the fact the  
>> RDF forms match SQL-style is necessary.
>
> That might be the case, but the simple fact that it exists means  
> that people use it as a broad benchmark for query performance. Even  
> triple store implementations that don't do RDB mapping are expected  
> to compete. It's certainly not phrased as "ignore this benchmark  
> unless X, Y, Z".
>
> Even so, it's benchmarking something that's (broadly speaking) only  
> interesting to implementors of such triple stores. Users don't (or  
> shouldn't) care how well their graph store can emulate a traditional  
> SQL DB.

Yes, I agree, it's not really a criticism of BSBM, it's more that  
there's no really appropriate benchmark for RDF yet.

We have some internal stuff that we use, but I wouldn't really call it  
"representative", it's just some real-world data and queries, but RDF  
can have os much variety it's hard.

- Steve

Received on Monday, 5 October 2009 21:29:06 UTC