Re: plural vs singular properties (a proposal) from Renato Golin on 2008-01-07 (semantic-web@w3.org from January 2008)

From: Renato Golin <renato@ebi.ac.uk>
Date: Mon, 07 Jan 2008 16:41:02 +0000
To: Sampo Syreeni <decoy@iki.fi>
CC: tim.glover@bt.com, garret@globalmentor.com, andrewfnewman@gmail.com, fmanola@acm.org, semantic-web@w3.org
Message-ID: <4782561E.40702@ebi.ac.uk>

Sampo Syreeni wrote:
> As somebody who comes from a relational background, I'd want to add that
> this is not quite the whole story. In relational circles, this sort of
> design is called an entity-attribute-value, or EAV, model, and it's a
> uniformly contentious design choice. The reason why it has been chosen
> for RDF and why it regularly comes up in relational schemata is that
> it's completely general, so that it enables us to handle semi-structured
> data whose precise structure we do not know beforehand. This is for
> example what enables RDF to be merged. But the design also exacts a
> price in performance, integrity and semantic precision.

Hi Sampo,

Totally agree. EAV is a dirty hack to make generic things work on
relational data models and the performance is horrible whenever you want
something more than just retrieve value by key.

A very simple example is to look for the shortest path between two
keys/values (very common in semantic queries), you can't even do it
using standard SQL, therefore throwing away any optimization the DB
engine could do for you.

> Under RM, those would be expressed as data dependencies, leading to
> integrity constraints. Most often they take the form of inclusion
> dependencies, which are then implemented as foreign key constraints.
> Existing RDBMSes tend to be a bit limited in how far their constraint
> mechanisms carry for this sort of thing, but that's already separate
> from theory.

Constraints in DBs are simple because the performance is, again, rubbish.

Foreign Keys are fast when you have one or two but when you cascade them
from one table to the other it can become a nightmare to the simplest of
the inserts/updates.

Triggers are also very difficult to implement efficiently, there are
some triggers on our database that counts for up to 85% of processing
time in Oracle and they're far simpler than we would want it to be for
the sole reason of reducing runtime.

Indexes can also become a problem if you have too many of them in the
same table. Updating all indexes can make inserts and updates be slower
than most selects on that table.

Every new constraint you add there is a performance penalty for
maintaining the data and the less constraints you have the penalty is
for retrieving the data. You just need to equalize what's the most
important for you and work it out.

With RDF there is no constraints, so inserting is always lightning fast
and retrieving is sometimes np-complete... ;)

cheers,
--renato

-- 
Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Received on Monday, 7 January 2008 16:41:28 UTC