- From: Paul Houle <ontology2@gmail.com>
- Date: Fri, 2 Jul 2010 12:55:32 -0400
- To: Henry Story <henry.story@gmail.com>
- Cc: Linked Data community <public-lod@w3.org>
- Message-ID: <AANLkTimWron5JhxEzj0L5FLEBj-POEuV0j0T9GPbJZP6@mail.gmail.com>
On Fri, Jul 2, 2010 at 11:20 AM, Henry Story <henry.story@gmail.com> wrote: > > > So similarly with RDF stores. Is it not feasible that one may come up with > just in time > storage mechanisms, where the triple store could start analyising how the > data was used in > order then to optimise the layout of the data on disk? Perhaps it could > end up being a > lot more efficient than what a human DB engineer could do in that case. > > That's a nice research project and it could be a very nice project if it's perfected. Salesforce.com has a patent on something that's pretty similar: http://www.faqs.org/patents/app/20090276395 I attended a talk at Dreamforce last year where they described how their system works. To a developer, salesforce.com offers something that looks a lot like a relational database. Their customers are spread out on about 10 distinct Oracle 10g clusters; each of these has a central "fact" table which is essentially a triple/quad store. "Rows" seen from the customer's perspective are actually atomized into individual triples... the core table, however, has additional tags which identify each triple as belonging to a particular salesforce.comcustomer. This way there might be 10,000-100,000 customers that share an 'instance' of the Salesforce.com system. Now, to supplement this, Salesforce.com creates additional relational tables in Oracle that speed up particular queries. It uses automatic profiling to decide when it's going to create these tables, create indexes, etc. It's pretty amazing to watch. I've built a system that communicates with Salesforce.com via the API. The first time I run it against a salesforce instance, one of the queries it runs times out. If I run it again immediately, it times out again. If I come back in ten minutes, it works O.K. because the system has analyzed my query and built the structures to make the query efficient. That said, Salesforce.com is designed for OLTP applications and sucks for analytical work. You're only allowed to get information in limited size chunks; until very recently there wasn't anything like GROUP BY. More to the point, Salesforce.com charges about $1500/month/GB of storage. This is affordable for OLTP work, but the semantic work I do involves so much data that I couldn't possibly afford that.
Received on Friday, 2 July 2010 16:56:05 UTC