Re: [Virtuoso-users] Reification alternative from Ivan Mikhailov on 2010-10-13 (public-lod@w3.org from October 2010)

From: Ivan Mikhailov <imikhailov@openlinksw.com>
Date: Thu, 14 Oct 2010 01:49:24 +0700
To: Aldo Bucchi <aldo.bucchi@gmail.com>
Cc: Mirko <idonthaveenoughinformation@googlemail.com>, public-lod@w3.org, Virtuoso Users <virtuoso-users@lists.sourceforge.net>
Message-ID: <1286995764.2784.49.camel@octo.iv.dev.null>

Hello Aldo,

I'd recommend to keep RDF_QUAD unchanged and use RDF Views to keep n-ary
things in separate tables. The reason is that the access to RDF_QUAD is
heavily optimized, we've never "polished" any other table to such a
degree (and I hope we will not :), and any changes may result in severe
penalties in scalability. Triggers should be possible as well, but we
haven't tried them, because it is relatively cheap to "redirect" data
manipulations to other tables. Both the loader of files and SPARUL
internals are flexible enough so it may be more convenient to change
different tables depending on parameters: the loader can call arbitrary
callback functions for each parsed triple and SPARUL manipulations are
configurable via "define output:route" pragma at the beginning of the
query.

In this case there will be no need in writing special SQL to "triplify"
data from that "wide" tables because RDF Views will do that
automatically. Moreover, it's possible to automatically create triggers
by  RDF Views that will materialize changes in "wide" tables in RDF_QUAD
(say, if you need inference). So instead of editing RDF_QUAD and let
triggers on RDF_QUAD reproduce the changes in wide tables, you may edit
wide tables and let triggers reproduce the changes in RDF_QUAD. The
second approach is much more flexible and it promise better performance
due to much smaller activity in triggers. For cluster, I'd say that the
second variant is the only possible thing, because fast manipulations
with RDF_QUAD are _really_ complicated there.

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com

On Wed, 2010-10-13 at 12:57 -0300, Aldo Bucchi wrote:
> Hi Mirko,
> 
> Here's a tip that is a bit software bound but it may prove useful to
> keep it in mind.
> 
> Virtuoso's Quad Store is implemented atop an RDF_QUAD table with 4
> columns (g, s, p o). This is very straightforward. It may even seem
> naive at first glance. ( a table!!? ).
> 
> Now, the great part is that the architecture is very open. You can
> actually modify the table via SQL statements directly: insert, delete,
> update, etc. You can even add columns and triggers to it.
> 
> Some ideas:
> * Keep track of n-ary relations in the same table by using accessory
> columns ( time, author, etc ).
> * Add a trigger and log each add/delete to a separate table where you
> also store more data
> * When consuming this data, you can use SQL or you can run a SPARQL
> construct based on a SQL query, so as to "triplity" the n-tuple as you
> wish.
> 
> The bottom suggestion here is: Take a look at what's possible when you
> escape "SPARQL" only and start working in a hybrid environment ( SQL +
> SPARQL ).
> Also note that the "self-contained" nature of RDF assertions ( facts,
> statements ) makes it possible to do all sorts of tricks by taking
> them into 3+ tuple structures.
> 
> My coolest experiment so far is a time machine. I log adds and deletes
> and can recreate the state of the system ( Quad Store ) up to any
> point in time.
> 
> Imagine a Queue management system where you can "replay" the state of
> the system, for example.
> 
> Regards,
> A

Received on Wednesday, 13 October 2010 18:49:56 UTC