Re: Use cases for Reification in RDF Triple stores from Dave Reynolds on 2003-01-07 (www-rdf-interest@w3.org from January 2003)

From: Dave Reynolds <der@hplb.hpl.hp.com>
Date: Tue, 07 Jan 2003 17:36:37 +0000
To: Bob MacGregor <macgregor@ISI.EDU>
CC: www-rdf-interest@w3.org, jena-dev <jena-dev@yahoogroups.com>
Message-ID: <3E1B1025.435981F6@hplb.hpl.hp.com>
Bob MacGregor wrote:
>
> What Jena calls a "shortcut" we consider to be the right way to reify 
> (except that technically, its not reification at all, its just allowing 
> statements to be arguments).

Agreed, I understand that, and am well aware of how commonly used it is in KR
systems. I guess my point is that Jena is not trying to be a general purpose KR
system it is trying to be a faithful RDF implementation, but that this is one
place where it deviates from RDF, in the interests of usability. Thus we need to
be careful in this discussion to separate Jena-specific issues from general RDF
issues of relevance to rdf-interest.

RDF does seem fundamentally limited by having no real means to represent nested
expressions. Reification gives it a way to at least encode such nesting (with a
"small constant factor" storage cost :-). Though the working group's decision
that reification represents a "stating" not a "statement" may undermine this
usage.

> >I believe that would allow you to implement a non-quadratic deleteResource.
> No it wouldn't, because 'deleteResource' (and 'renameResource') are intended
> to be generic operations that work on all RDF graphs/models. 

OK. I had misunderstood - apologies. The picture I had was that you were
building a KR layer on top of RDF using Jena and that deleteResource was part of
that additional layer. You are correct that if you want to implement it for an
arbitrary Jena Model then the current API limits the achievable performance.

> >Not sure about this. In RDF, statements are only asserted. The semantics of
> >an RDF graph is just the conjunction of the individual statements. There is no
> >notion of a not-asserted statement.
> Using Jena I can create a triple/statement <a,b,c> and embed it into another
> triple, yielding <<a,b,c>,d,e>. If I add the compound triple to a model 
> (using Statement.add), then that triple is asserted but the nested one isn't 
> (because I never added it). That would seem to contradict your last statement 
> (unless you are saying that Jena can do this, but RDF can't).

That is what I'm saying. There is no direct way to represent precisely
"<<a,b,c>,d,e>" in RDF. The nearest equivalent is to assert (using triples for
clarify if not brevity!):
   _x rdf:type rdf:Statement.
   _x rdf:subject a.
   _x rdf:predicate b.
   _x rdf:object c.
   _x d e.

All of these are positive assertions. The difficult question is then whether _x
is a representation of the abstract triple "<a,b,c>" which is now being referred
to but not asserted. Jena1 pretends you can do this and goes further by not
actually physically creating the first four triples internally (the "shortcut").

The working group has decided that _x is actually a reference to a stating of
"<a,b,c>" but is not the abstract statement itself. It is saying that there is a
"real or notional" RDF document somewhere with the statement "<a,b,c>" asserted
in it and that _x represents that occurrence in that particular document. In
particular, if you now meet another bunch of assertions:
   _y rdf:type rdf:Statement.
   _y rdf:subject a.
   _y rdf:predicate b.
   _y rdf:object c.
you are no longer allowed to conclude that <_y,d,e>.

To be compliant Jena2 will need to implement this "clarified" behaviour. 

> >Why not just use Jena Models to provide your context?
[snip]
> We can't do this for a number of reasons, but a compelling reason is that
> 'models'
> are not first class objects of discourse, i.e., they are not resources.

I agree that they are not, and that they should be, but perhaps that is a
different issue. 

I was just proposing an implementation technique. One which might allow you to
side step the limitation you identified with the existing Jena API by using one
Model as a scatchpad in which to record all the statements (*all* reified via
the shortcut) along with their provenance and a second Model to index the subset
of those statements which you currently believe.

> One final comment on quads vs. the  'getIt'/'setIt' that I am advocating.
> They are not equivalent.  

Understood.

> I'm advocating that we keep 'triples' as the are, while adding a means 
> for efficiently attaching meta-information to a triple (short-cut 
> reification provides a less efficient means for attaching meta-information 
> to a triple -- that is what we will use for now, but
> we take a performance hit doing so (as exemplified by the 'deleteResource'
> example in my previous message)).

I'm not sure there is that much difference, conceptually, between "attaching
meta-information to a triple" and efficient reification. Surely your
deleteResource example illustrates limitations of the current Jena
implementation rather than fundamental problems. In particular, if
listStatements/listReifiedStatements gave more efficient selective access to the
reified-only statements and if Statement.isReified() were more efficiently
implemented then your deleteResource would be efficient.

I do see your use case as a good test of an efficient reification API but on the
face of it getIt/setIt could just be thought of as a syntactically convenient
way to access properties of a reification bNode.

Cheers,
Dave
Received on Tuesday, 7 January 2003 12:40:29 UTC