W3C home > Mailing lists > Public > public-lod@w3.org > October 2010

Re: [Virtuoso-users] Reification alternative

From: Aldo Bucchi <aldo.bucchi@gmail.com>
Date: Wed, 13 Oct 2010 16:02:13 -0300
Message-ID: <AANLkTinsxuypv=DwRx7CUd9w3+MaMK92Gu7Y7n9c1TgR@mail.gmail.com>
To: Ivan Mikhailov <imikhailov@openlinksw.com>
Cc: Mirko <idonthaveenoughinformation@googlemail.com>, public-lod@w3.org, Virtuoso Users <virtuoso-users@lists.sourceforge.net>
Hi Ivan,

Hehe, I knew you were going to jump in, that's why I CC'd this to
virtuoso-users ;)

Before getting into the content of your response, let me just say this:

I think Mirko's example is actually really common. Every application
that I have built needs to keep track of ( at least ) two other
dimensions beyond the "core data model/state":
* Time ( Be it audit trail or just timestamp )
* Author

You provide some really valuable tips in your reply as to how you can
tune your Virtuoso installation to actually accomplish this.

On Wed, Oct 13, 2010 at 3:49 PM, Ivan Mikhailov
<imikhailov@openlinksw.com> wrote:
> Hello Aldo,
>
> I'd recommend to keep RDF_QUAD unchanged and use RDF Views to keep n-ary
> things in separate tables. The reason is that the access to RDF_QUAD is
> heavily optimized, we've never "polished" any other table to such a
> degree (and I hope we will not :), and any changes may result in severe
> penalties in scalability. Triggers should be possible as well, but we
> haven't tried them, because it is relatively cheap to "redirect" data
> manipulations to other tables. Both the loader of files and SPARUL
> internals are flexible enough so it may be more convenient to change
> different tables depending on parameters: the loader can call arbitrary
> callback functions for each parsed triple and SPARUL manipulations are
> configurable via "define output:route" pragma at the beginning of the
> query.

Interesting! ;)
>From the docs:

"output:route: works only for SPARUL operators and tells the SPARQL
compiler to generate procedure names that differ from default. As a
result, the effect of operator will depend on application. That is for
tricks. E.g., consider an application that extracts metadata from DAV
resources stored in the Virtuoso and put them to RDF storage to make
visible from outside. When a web application has permissions and
credentials to execute a SPARUL query, the changed metadata can be
written to the DAV resource (and after that the trigger will update
them in the RDF storage), transparently for all other parts of
application."

Where can I find more docs on this feature?
( I don't actually need this, just asking )

>
> In this case there will be no need in writing special SQL to "triplify"
> data from that "wide" tables because RDF Views will do that
> automatically. Moreover, it's possible to automatically create triggers
> by  RDF Views that will materialize changes in "wide" tables in RDF_QUAD
> (say, if you need inference). So instead of editing RDF_QUAD and let
> triggers on RDF_QUAD reproduce the changes in wide tables, you may edit
> wide tables and let triggers reproduce the changes in RDF_QUAD. The
> second approach is much more flexible and it promise better performance
> due to much smaller activity in triggers. For cluster, I'd say that the
> second variant is the only possible thing, because fast manipulations
> with RDF_QUAD are _really_ complicated there.

Great to know all this!
Again, I think the possibility to mix and match SPARQL + SQL via RDF
Views, triggers, output:route, etc is a really good solution for >4ary
relations.

Built-in Time Dimension is something I am looking forward to implement
to some of my applications as they provide enormous business value.

Thanks,
A

>
> Best Regards,
>
> Ivan Mikhailov
> OpenLink Software
> http://virtuoso.openlinksw.com
>
>
> On Wed, 2010-10-13 at 12:57 -0300, Aldo Bucchi wrote:
>> Hi Mirko,
>>
>> Here's a tip that is a bit software bound but it may prove useful to
>> keep it in mind.
>>
>> Virtuoso's Quad Store is implemented atop an RDF_QUAD table with 4
>> columns (g, s, p o). This is very straightforward. It may even seem
>> naive at first glance. ( a table!!? ).
>>
>> Now, the great part is that the architecture is very open. You can
>> actually modify the table via SQL statements directly: insert, delete,
>> update, etc. You can even add columns and triggers to it.
>>
>> Some ideas:
>> * Keep track of n-ary relations in the same table by using accessory
>> columns ( time, author, etc ).
>> * Add a trigger and log each add/delete to a separate table where you
>> also store more data
>> * When consuming this data, you can use SQL or you can run a SPARQL
>> construct based on a SQL query, so as to "triplity" the n-tuple as you
>> wish.
>>
>> The bottom suggestion here is: Take a look at what's possible when you
>> escape "SPARQL" only and start working in a hybrid environment ( SQL +
>> SPARQL ).
>> Also note that the "self-contained" nature of RDF assertions ( facts,
>> statements ) makes it possible to do all sorts of tricks by taking
>> them into 3+ tuple structures.
>>
>> My coolest experiment so far is a time machine. I log adds and deletes
>> and can recreate the state of the system ( Quad Store ) up to any
>> point in time.
>>
>> Imagine a Queue management system where you can "replay" the state of
>> the system, for example.
>>
>> Regards,
>> A
>
>
>



-- 
Aldo Bucchi
@aldonline
skype:aldo.bucchi
http://aldobucchi.com/
Received on Wednesday, 13 October 2010 19:03:06 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:29 UTC