Re: Reification - whats best practice? from Bob MacGregor on 2004-08-28 (www-rdf-interest@w3.org from August 2004)

From: Bob MacGregor <macgregor@isi.edu>
Date: Sat, 28 Aug 2004 10:57:55 -0700
To: Frank Manola <fmanola@acm.org>
CC: www-rdf-interest@w3.org
Message-ID: <4130C7A3.9070809@isi.edu>
Hi Frank,

I don't want to have too many comments on comments, but at least one of 
your points needs
clarification.  Not only do I not blame Pat Hayes for the state of 
reification in RDF, I applaud
him.  When I first started using RDF, I had definite plans for using 
statement reification, and
the lack of semantics bothered me, because it meant that certain 
expectations I had regarding
what an RDF triple store ought to do for me were not met.  After I found 
out from direct
experience how bad statement reification is for representing provenance 
information, I switched
sides, and became an opponent of RDF reification.  At that point, the 
lack of semantics became
a plus, because it meant that implementers of triple stores had somewhat 
fewer obstacles in their
way when building to spec.

You mention the possibility that if we admit "named containers" 
(contexts) to RDF then we open
the possibility of people representing provenance information in more 
than one way.  There are
two major objections to that position.  One is that its impossible to 
prevent that possibility -- we
are dealing with logic, and any logic that has sufficient expressivity 
for practical application will
admit more than one way of solving a given technical problem.  RDF's 
lack of semantics makes
it in some sense very expressive.  I am already expressing provenance 
information in RDF; and
the way I do it is certainly not approved by the RDF spec.  In fact, its not
approved by me -- I could have done it the "right" way by embedding 
reified statements in a
named container, but that would have had horrendous space implications, 
and been impossible
from a querying point of view.  Instead, I invented a "hack" that gives 
the right behavior, but
would never be recommended as an official way to do contexts.  So, by 
necessity, I'm doing
exactly what you are counseling we shouldn't be doing.

The second objection to your position is the chicken-and-egg problem.  
Ideally, we could
experiment with various kinds of provenance representations, figure out 
which is the best,
and then make that a standard.  However, the gap between what's needed 
is too large
to admit experiements within the RDF framework.  Think about the 
relational database
community.  They have gone more than 25 years without a decent solution 
to representing
provenance information.  Why? because their representation doesn't admit 
a good solution.
Unfortunately, RDF is in nearly as bad a state right now -- not quite as 
bad, because the
hack that I use would not be practical within the relational model -- 
but bad enough.  So
we are in a position that the only people that can experiment with 
provenance are those
willing to violate the RDF spec.

But its actually worse than that.  To get decent performance while using 
provenance,
you have to make modifications at the implementation level.  Relatively 
few users are going
to build or extend an existing triple store just to get provenance 
information.  So, the
"experiment" is going to proceed relatively slowly.  On the other hand, 
suppose Jena
implemented quads.  Suddenly, the opportunities for experimentation 
would increase
100-fold (to pick a random number).  Then there would be a flurry of 
experimentation,
probably leading to much earlier recognition of what a standard 
semantics for
provenance (a step beyond contexts) should look like.

Right now, the people adopting your position are consigning the majority 
to a
Gedankenexperiment, where the expected rate of progress will be about 
what it
is in the relational database community -- nil.

Cheers, Bob


Frank Manola wrote:

> Hi Bob--
>
> Some comments below.
>
> Bob MacGregor wrote:
>
>>
>> Hi Frank,
>>
>> You make many good points; I don't like to get deeply nested, so I'll
>> respond just on top.
>>
>> You say RDF already has containers.  True -- its easy to create a 
>> container
>> of things that denote "entities", but its MUCH less practical to 
>> create a container
>> of statements.  Yes, its doable, but this is the Turing argument all 
>> over again -- we
>> already have assembly language, but we would like to code in Java.
>
>
> I understand what you say about the ease of creating containers of 
> statements, but I don't think this is quite the Turing argument.  In 
> the first place, you're proposing to add *another* container 
> construct. There certainly may be good reasons why another such 
> construct is needed, but it seems reasonable to me to look at this 
> pretty carefully (e.g., look at how it interacts with the existing 
> containers which, as I pointed out, can still be used for this 
> purpose).  After all, I don't think we want to duplicate the situation 
> we were in with the current reification vocabulary.  In the second 
> place, it seems to me that the difficulty you describe in creating 
> containers of statements is largely due to the difficulty of 
> identifying statements, rather than the need for a new container per 
> se.  Granted that a special kind of container just for statements 
> might (indirectly) help deal with some of those problems, but if, for 
> example, I ultimately need to identify individual statements, creating 
> a separate container for each statement seems like unnecessary 
> indirection.
>
>>
>> You are insisting on semantics.  RDF has almost no semantics -- 
>> graphs are
>> just graphs; there is no attempt to assign truth per se.  I'm pushing 
>> for named
>> containers, another data structure, with no built-in semantics pe se 
>> (except
>> that the contexts I use allow for contexts within contexts, which 
>> induces
>> a few entailments).  Note: Pat Hayes has carefully insured that RDF 
>> statement
>> reification has essentially no semantics.
>
>
> I understand what you're after.  My point is simply that if you have a 
> purely structural approach using containers and nothing else, one 
> person might use the container for indicating provenance, and other 
> might use the container for an entirely different purpose, and there 
> would be no explicit indication which was which (worse, different 
> people might use the same structure in *slightly different ways* to 
> indicate provenance).    My main concern isn't just to represent 
> stuff, but to do so in a more interoperable way.  Hence my belief that 
> we need somewhat more than simply a new container structure (even if 
> it's only some additional conventions).  Note:  You needn't saddle Pat 
> with all the "blame" (if that's what it is) for the state of 
> reification.  In the first place, the entire RDF Core WG is 
> responsible for the specs as they stand, not just the individual 
> document editors.  In the second place, the WG found the reification 
> vocabulary in a confusing state, and picked what seemed to be the most 
> reasonable interpretation for the provenance use case. The fact that 
> reification has almost no semantics mostly follows from the ability 
> (or lack of same) of RDF to reflect those semantics (the point you 
> made at the beginning).
>
>>
>> Basically, you are advocating a cerebral exercise, followed by adoption.
>> The problem is that its hard to appreciate the utility of something like
>> contexts unless you have the option to use them (not just imagine what
>> it would be like).  Reified statements are a good negative example -- on
>> paper, they look promising, but in practice they s*ck.  Only relatively
>> few of us have the luxury of building applications using a real 
>> context mechanism
>> (have you?).
>>
>
> If by "advocating a cerebral exercise" you mean advocating thinking a 
> little more about fundamental additions to RDF (and their interactions 
> with the existing facilities), then that's certainly what I'm doing.  
> In particular, I fully appreciate the utility of "contexts" for doing 
> dozens of useful things, just as many other people have since 
> McCarthy's notes, and Guha's dissertation.  The problem is that those 
> dozens of useful things tend to involve different meanings and, as I 
> said above, I'm concerned about interoperability.
>
> Cheers,
>
> Frank


-- 

Bob MacGregor
Chief Scientist

	
	Siderean Software Inc
5155 Rosecrans Ave, #1078
Hawthorne, Ca 90250 
<http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=5155+Rosecrans+Ave&csz=Hawthorne%2C+Ca+90250&country=us> 

bmacgregor@siderean.com <mailto:bmacgregor@siderean.com> 	
tel: 	+1-310-491-3424
fax: 	+1-310-491-3338
Received on Saturday, 28 August 2004 17:58:31 UTC