Re: Reification - whats best practice? from Leo Sauermann on 2004-08-30 (www-rdf-interest@w3.org from August 2004)

From: Leo Sauermann <leo@gnowsis.com>
Date: Mon, 30 Aug 2004 10:33:28 +0200
To: Eric Jain <Eric.Jain@isb-sib.ch>
CC: www-rdf-interest@w3.org
Message-ID: <4132E658.30908@gnowsis.com>
Hi all,

Ok, I will bring this from textual discussion to real triple discussion. 
I hope this works....

> Leo Sauermann wrote:
>
>> reification syntax is "not practical" as we see above in the thread.
>
>
> Please - as previously pointed out [1] quads are not always a suitable 
> replacement for reification.


I do not think that the quote [1] does point out this. [1] just contains 
a theoretical assumption about "bloating" that I will falsify below.

ok, the original quote was:

[1]: http://lists.w3.org/Archives/Public/www-rdf-interest/2004Aug/0178.html

>Consider the following example:
>
>   s1 p1 o1 : backed by a1 and a2
>   s1 p2 o2 : backed by a1 and a3
>
>If we were to use contexts for expressing this, there would have to be 
>three different contexts (for statements backed a1, a2 and a3), and both 
>statements would have to be duplicated into two different contexts. 
>Correct? I imagine this approach would bloat the data far more than 
>normal reification would...


I assume that bloating means "too many triples, they look ugly"

Ok, I want to see If this assumption is true (theoretically it sounds good,
but I want to move from theory to reality. This is not much work, actually)

EXAMPLE 1 with reification

a1 rdf:type example:source // just to have a triple
a2 rdf:type example:source // just to have a triple
a3 rdf:type example:source // just to have a triple
st1 rdf:type rdf:Statement // spec needs this (or?)
st1 rdf:subject s1
st1 rdf:predicate p1
st1 rdf:object o1
st2 rdf:type rdf:Statement // spec needs this (or?)
st2 rdf:subject s1
st2 rdf:predicate p2
st2 rdf:object o2
st1 example:backedBy a1
st1 example:backedBy a2
st2 example:backedBy a1
st2 example:backedBy a3
== 15 not easily readable statements 

Example 2 with quads
a1 rdf:type example:source // just to have a triple
a2 rdf:type example:source // just to have a triple
a3 rdf:type example:source // just to have a triple
sta1 s1 p1 o1
sta2 s1 p1 o1
sta1 s1 p2 o2
sta3 s1 p2 o2
sta1 example:backedBy a1
sta2 example:backedBy a2
sta3 example:backedBy a3

== 10 easily readable triples

So let us look at the assumption again:

>I imagine this approach would bloat the data far more than 
>normal reification would...
>
I think that this assumption is not true when seen from the real life 
triples above.
In theory, it sounds like bloating. But when you write down in triples 
(or quads) of the example, I think  we have a different view.
I think the quad triples are much better readable.

don't tell me that "implementations hide these many triples away from me".
No, they do not. When creating the triples, you have to use a 
reification API to create the triples, and when querying, you have to 
use the reification API again. and reification APIs demand you to code 
somethings.

So the amount of triples above (10 vs 15)  is  *relative *to the amount 
of lines of code you have to write when using a RDF API.

Another BIG advantage is deleting:

== Deleting Problem ===
when I say: ok, I think s1 p1 o1 is not needed anymore, because context 
a1 falls away, I can run a
"delete (all where reified by a1)"
on my big gaph.
but this will also delete the triple identified by a2. I actually had 
exactly this problem in the last week. Using Jena. thats why I started 
the thread, anyhow.
So if you want to deny this problem, show me your code.
( I pasted mine here:)
http://lists.w3.org/Archives/Public/www-rdf-interest/2004Aug/0255.html

when i have quads I can say:
"delete (* where quad has context X, X example:backedBy a1)"

this will not delete triples from other quads. Reification APIs are not 
that strict.

I think that there may be a mistake in my assumptions above and my 
conclusions, but I am quite sure that the above examples will run in 
RDF-Gateway. And I experienced the problems using Jena.

I see this from real life implementors view: when I have to debug 
things, reification is not readable. When I code stuff, reification 
creates more triples and is harder to handle.

So I would like to code using quads, but I miss the tools and APIs to do 
it (especially, I miss the Jena implementation of quads :-)
As a detail on the side: Jena has quads somewhere. But the triple class 
does not have it, so it is not in the roots of jena.
http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/graph/Triple.html

I think reification was a clear approach, It is totally triple based and 
gets 100% thumbs up for theoretical coolness in the RDF triple world. 
But my practical problems could be much easier solved with quads. :-)

cheers
Leo


A note about the deleting problem:
 i usually keep all my triples from all different sources in one big 
graph, this is more real life: you want to search for informaiton that 
can be anywhere in your knowledge, you do not want to search different 
graphs. You want to have your query engine run on all data. Query 
engines do not usually do "over-more-than-one-graph" queries, and if 
they do, they may return other results than on the integrated data.
and: YES  it is possible to build an "aggregated graph" that may contain 
"thousands of graphs" but NO the indexing in database backed graphs does 
not work then, so please don't suggest this to me.
My local database should be One-Big-Graph. containing triples with 
contexts/quads :-) That eases things.
Received on Monday, 30 August 2004 08:33:36 UTC