W3C home > Mailing lists > Public > public-rdf-wg@w3.org > March 2011

[JSON] beating MongoDB

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 23 Mar 2011 13:24:56 -0400
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: RDF WG <public-rdf-wg@w3.org>
Message-ID: <1300901096.3138.1572.camel@waldron>
[ this is somewhat off-topic, I know. ]

On Wed, 2011-03-23 at 12:50 -0400, Manu Sporny wrote:
> 
> Fundamentally, until there is a free, open source, GPLed triple store
> that is performant, scales to billions of triples and provides an easy
> to use API - RDF and SPARQL are going to stay roughly as popular as
> they
> are right now. Until there is something to replace the 'M' in the LAMP
> stack for RDF applications, we're not going to see a change in the way
> Web developers develop.
> 
> For example, our company needs to store roughly 100 billion+ triples
> per
> year of financial transaction data. We're currently using a home-built
> MySQL solution for our storage mechanism, we will probably migrate to
> MongoDB in time. We have no free, open source choice for storing this
> information - nobody does. So the idea that the average web developer
> is
> backed by a triple store is a terrible assumption to make. The only
> thing that even remotely comes close to scaling for us is MongoDB and
> MongoDB speaks JSON (specifically, BSON).

I don't think we're likely to beat MongoDB at its own game.  The
problems it has to solve are somewhat simpler.  If someone just needs a
closed database backend, and they don't need it to do joins, they can
and probably should just use MongoDB.     (Or Redis or whatever; I'm
just using MongoDB as an example non-RDF store because you mentioned it
and I'm familiar with it.)

I mean, by all means, they should try one of the faster quadstores [1],
some of which are open source if they need that, but I wouldn't be at
all surprised if they preferred MongoDB, and I wouldn't try to talk them
out of it.  

Instead, I think the advantages of RDF show up on a different set of a
problems.  They show up mostly when you need to merge data from multiple
independent sources.   Once people see that working, they'll see plenty
of reasons to use RDF.

   -- Sandro

[1] http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/
Received on Wednesday, 23 March 2011 17:25:08 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:40 GMT