- From: Renato Golin <renato@ebi.ac.uk>
- Date: Mon, 07 Jan 2008 16:41:02 +0000
- To: Sampo Syreeni <decoy@iki.fi>
- CC: tim.glover@bt.com, garret@globalmentor.com, andrewfnewman@gmail.com, fmanola@acm.org, semantic-web@w3.org
Sampo Syreeni wrote: > As somebody who comes from a relational background, I'd want to add that > this is not quite the whole story. In relational circles, this sort of > design is called an entity-attribute-value, or EAV, model, and it's a > uniformly contentious design choice. The reason why it has been chosen > for RDF and why it regularly comes up in relational schemata is that > it's completely general, so that it enables us to handle semi-structured > data whose precise structure we do not know beforehand. This is for > example what enables RDF to be merged. But the design also exacts a > price in performance, integrity and semantic precision. Hi Sampo, Totally agree. EAV is a dirty hack to make generic things work on relational data models and the performance is horrible whenever you want something more than just retrieve value by key. A very simple example is to look for the shortest path between two keys/values (very common in semantic queries), you can't even do it using standard SQL, therefore throwing away any optimization the DB engine could do for you. > Under RM, those would be expressed as data dependencies, leading to > integrity constraints. Most often they take the form of inclusion > dependencies, which are then implemented as foreign key constraints. > Existing RDBMSes tend to be a bit limited in how far their constraint > mechanisms carry for this sort of thing, but that's already separate > from theory. Constraints in DBs are simple because the performance is, again, rubbish. Foreign Keys are fast when you have one or two but when you cascade them from one table to the other it can become a nightmare to the simplest of the inserts/updates. Triggers are also very difficult to implement efficiently, there are some triggers on our database that counts for up to 85% of processing time in Oracle and they're far simpler than we would want it to be for the sole reason of reducing runtime. Indexes can also become a problem if you have too many of them in the same table. Updating all indexes can make inserts and updates be slower than most selects on that table. Every new constraint you add there is a performance penalty for maintaining the data and the less constraints you have the penalty is for retrieving the data. You just need to equalize what's the most important for you and work it out. With RDF there is no constraints, so inserting is always lightning fast and retrieving is sometimes np-complete... ;) cheers, --renato -- Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
Received on Monday, 7 January 2008 16:41:28 UTC