Re:Re: Use cases for Reification in RDF Triple stores from Bob MacGregor on 2003-01-06 (www-rdf-interest@w3.org from January 2003)

From: Bob MacGregor <macgregor@ISI.EDU>
Date: Mon, 06 Jan 2003 10:58:29 -0800
To: Dave Reynolds <der@hplb.hpl.hp.com>
Cc: www-rdf-interest@w3.org, jena-dev <jena-dev@yahoogroups.com>
Message-Id: <5.1.1.6.0.20030106100816.00b56d20@tnt.isi.edu>
Dave,
You made several points (as did I).  Below is a partial response:

       >On Monday, January 6, 2003, at 04:24 AM, Dave Reynolds wrote:

 >The first point to clarify is that what you are talking about here is the
 >Jena-specific reification "shortcut". This attempts to give you some of the
 >features of reification, without the overhead of asserting four triples for
 >every reified statement. However, there is nothing to stop you using the 
full
 >official RDF reification approach with Jena - assert all four reification
 >triples and manipulate them accordingly, ignore the shortcut stuff like
 >"isReified".
Using full reification is not an option, unless we reified EVERY statement
in an application. That's because we cannot decide apriori which statements
will get annotated (with probabilities, trust, temporal modification, 
etc.). Every
statement is fair game.

When you fully reify, then the entire form of a statement changes, and queries
that retrieve non-reified statements will not also retrieve reified 
statements.  So
converting from non-reified to fully-reified "on-the-fly" is not a viable 
option.

What Jena calls a "shortcut" we consider to be the right way to reify 
(except that
technically, its not reification at all, its just allowing statements to be 
arguments).
Logic-based KR systems (MRS, Epikit, Epilog, Cyc, PowerLoom, SNePS, etc) have
been using the "shortcut" style of reification for a very long time. Its 
not a curiosity;
its part of the fabric. For example, to represent a disjunction of two 
triples, you create
a triple that nests two (or more) other triples:
<<a,b,c>, OR, <d,e,f>>
Note that with this kind of usage, its critical that the nested triples not 
be considered
as asserted in the model.

 >I believe that would allow you to implement a non-quadratic deleteResource.
No it wouldn't, because 'deleteResource' (and 'renameResource') are 
intended to be
generic operations that work on all RDF graphs/models. Following your last 
comment
would mean legislating away the use of (short-cut style) reification.

     >>Secondly, there should be a 'bit' that API users can use to mark
     >>statements as true or not. However, it really should be 'wider' than a
     >>single 'bit'. Give us enough bits (e.g., make it a resource), and we 
can
     >>use such an attachment to build our own context mechanisms.
 >Not sure about this. In RDF, statements are only asserted. The semantics 
of an
 >RDF graph is just the conjunction of the individual statements. There is no
 >notion of a not-asserted statement.
Using Jena I can create a triple/statement <a,b,c> and embed it into 
another triple, yielding
<<a,b,c>,d,e>. If I add the compound triple to a model (using 
Statement.add), then that
triple is asserted but the nested one isn't (because I never added it). 
That would seem
to contradict your last statement (unless you are saying that Jena can do 
this, but RDF
can't).

 >Personally, I've got no problems with an application choosing to use 
reification
 >as a way of separating statement from their truth status, but I'm not 
sure this
 >should be built in to APIs like Jena.
I agree. That's why I'm recommending providing a hook that makes it easy for
others to make that separation, using the Jena API as a substrate.

 >Furthermore, there would be some nasty interactions between this statement
 >"truth status field" and Jena models - the truth status should presumably 
be a
 >property of the pair [statement, model] rather than just a property of the
 >statement. This then suggests a different design approach for you ...
The truth status is ALWAYS a property of a pair [statement, model]. A 
statement
can be true in one model, but not in another.

 >Why not just use Jena Models to provide your context? For example, use 
one Model
 >to contain all your statements of unknown truth status and a separate 
Model to
 >contain the current world view - i.e. the current set of "asserted" 
statements.
 >In the first model you could include all your trust and probability 
information
 >using reification and now you can use the reification shortcut without 
any loss
 >of searchability.
We can't do this for a number of reasons, but a compelling reason is that 
'models'
are not first class objects of discourse, i.e., they are not resources. 
(They ought
to be, but I don't want to wait around for the RDF committee to make that 
decision).
When you assign a probability or a degree of trust, you also record who made
that assertion, and probably also record when the assertion occurred.
By providing a handle (the fourth argument) in the form of a resource, you 
enable me
to make arbitrary assertions about the nature of the 'context'.

One final comment on quads vs. the  'getIt'/'setIt' that I am advocating.  They
are not equivalent.  With quads, we can assert a triple twice, with two 
different
fourth arguments:
     <a, b, c, d1>
     <a, b, c, d2>
Here, there is no explicit means for detecting that 'd1' and 'd2' are 
annotating
the same 'triple', since the notion of triple has now evaporated.  I'm 
advocating
that we keep 'triples' as the are, while adding a means for efficiently 
attaching
meta-information to a triple (short-cut reification provides a less 
efficient means
for attaching meta-information to a triple -- that is what we will use for 
now, but
we take a performance hit doing so (as exemplified by the 'deleteResource'
example in my previous message)).

Cheers, Bob
Received on Monday, 6 January 2003 14:07:37 UTC