Re: Concerns about reification

Although many of the points that are going to be made in the ensuing
thread on this will have been made before (maybe even X times!), perhaps
we have a different perspective on some of them this time around. 
Anyway, let me take a crack (hoping I'm not going to restate some heresy
we've already recanted...)

Sergey Melnik wrote:
> 
> Brian asked me (for the Xth time, sigh) to express my concerns about
> reification in writing. My position remains that we need an
> interoperable and efficient way of doing reification.

I agree with the last sentence.

> 
> If we go for "stating" (answer "No" to Q1 in [1]), no special semantics
> is associated with the vocabulary rdf:Statement, rdf:subject,
> rdf:predicate and rdf:object. This means applications *cannot
> interoperate* using this vocabulary since its meaning is unspecified.
> Effectively, going for "stating" amounts to deprecating 4-triple
> reification as used today.
> 
> If we go for "statement" (answer "Yes" to Q1 in [1]), we get a (rather
> painful, admittedly, but endorsed) way of referring to statements found
> on Web pages and in RDF databases, recording provenance etc. This is IMO
> much more concrete and useful that just providing no definition at all,
> although the usability of 4-triple reification still remains seriosly
> hampered by its verbosity.

The problem is that I don't believe this is exactly right.  There are a
number of things involved here.  And verbosity isn't the main issue.

1.  The things found on Web pages and in RDF databases are *instances*
of statements, otherwise known (?) as "statings".  When I express
provenance (at least a lot of the time), I want to talk about a
particular instance in a particular place, not the single abstract thing
determined by a given (subject, predicate, object) triple.   I must, for
example, be able to say "fred wrote <s,p,o> at location X, at time T1",
and "mary wrote <s,p,o> at location Y, at time T2" and have those be
distinct things, and not thereby intermingle X, Y, T1, and T2. For
example, does:

   <stmt1> <rdf:type> <rdf:Statement> .
   <stmt1> <rdf:subject> <aSubject> .
   <stmt1> <rdf:predicate> <aPredicate> .
   <stmt1> <rdf:object> <anObject> .
   <stmt1> <ex:writtenby> <fred> .
   <stmt1> <ex:atTime> <T1> .

   <stmt2> <rdf:type> <rdf:Statement> .
   <stmt2> <rdf:subject> <aSubject> .
   <stmt2> <rdf:predicate> <aPredicate> .
   <stmt2> <rdf:object> <anObject> .
   <stmt2> <ex:writtenby> <mary> .
   <stmt2> <ex:atTime> <T2> .

   entail:
   
   <stmt2> <ex:writtenby> <fred> .
   <stmt2> <ex:atTime> <T1> .

I think the answer has to be "NO" if you want provenance in general to
work, because otherwise you can't even suggest that there are two
different instances involved.  [Depending on what you read into the fact
that there are separate <stmt1> and <stmt2>].

2.  In the example above, <stmt1> and <stmt2> were newly-minted URIs
defined as part of the process of reifying the two original occurrences
of what fred and mary said.  That is, in the M&S reification examples,
it's assumed that you can't directly identify the statement you want to
express provenance information about;  instead you have to create a new
URI, and then use the reification vocabulary to describe the subject,
predicate, etc. of that new entity.  The thing is, that:

a.  The best you can do with the reification vocabulary, as it is, is to
express provenance information about some reification (triple quad)
which you effectively claim describes (has the same subject, predicate,
and object as) some actual statement occurrence that the provenance is
*really* about.  However, there is no way in the current reification
mechanism to directly establish the relationship between the quad and
the actual statement occurrence the provenance is really about.  

b.  If the original statings each had a URI, no reification syntax would
be necessary to do provenance.  That is, if the actual triple written by
Fred had a URI, I could simply write:

   <fredtriple> <ex:writtenby> <fred> .
   <fredtriple> <ex:atTime> <T1> . 

Note that for interoperability of provenance I need mutual understanding
of the *provenance* vocabulary (e.g., <ex:atTime>), not just the
reification vocabulary.  

3.  It's not clear that you couldn't continue to use the current
vocabulary with the general meaning suggested by (a).  You'd just have
to accept that you have to take a certain amount on faith (that the
reification "description" is a suitable "stand in" for the actual triple
that you're describing with it).

> 
> In summary, if we go for "stating" we have *no* official mechanism for
> reification. In this case we'd have to suggest an alternative, we cannot
> just wash our hands. It is an illusion that we can leave the vocabulary
> undefined and at the same time recommend developers to use it in a
> consistent way. If we go for "statement" we do have a solution, albeit a
> poor one. If we run out of time in finding an alternative *efficient*
> way of doing reification, we could of course fall back on "statement"
> expressed using 4 triples. In this case, we would not have achieved much
> since inefficiency proved to be a show stopper for using 4-triple
> reification in the past 3 years.
> 

I don't think it hurts to have a standard vocabulary (like the
rdf:subject) for describing the "substructure" of statement
occurrences.  It's just that having that vocabulary doesn't get us all
that far in dealing with provenance.  I agree that we can't just wash
our hands;  too much has been made of "RDF reification".  

--Frank  [ducking under the desk]

-- 
Frank Manola                   The MITRE Corporation
202 Burlington Road, MS A345   Bedford, MA 01730-1420
mailto:fmanola@mitre.org       voice: 781-271-8147   FAX: 781-271-8752

Received on Friday, 15 February 2002 14:53:39 UTC