Re: Provenance for section 3 in technologies.tex

John S. Erickson wrote:

>Alberto made a huge point:
>
>  
>
>>in our interpretation of provenance/contexts in RDFStore we assumed
>>that a statement represents a fact that is asserted as true in a
>>certain context. This circumstance (e.g. space/temporal, situation or
>>scope) where the statement has been stated represents “contextual”
>>information about the statement [1][2]. For example, when triples are
>>being added to a graph it is often useful to be able to track back
>>where they came from (e.g. Internet source Web site or domain), how
>>they were added, by whom, why, when (e.g. date), when they will expire
>>(e.g. Time-To-Live) and so on. Such context (or provenance information)
>>can be thought of as an additional and orthogonal dimension to the
>>other 3 components. This concept is not part of the current RDF data
>>model [3] and referred to as “statement reification". From the
>>application developer point of view there is a clear need for such
>>primitive constructs to layer different levels of semantics on top of
>>RDF which can not be represented in the RDF triples space....
>>    
>>
>
>JSE: The notion of preserving the context of a statement WITHOUT TRANSFORMING
>THAT STATEMENT is critical for RDF application developers and I believe is
>being overlooked. I believe RDF's current approach, which reifies the
>statement, is artifically invasive and complex.
>
>In a real world, statements will be conceptually contained, aggregated and
>nested; it seems crazy that in order to deal with them in such a way, we must
>artificially blow them apart.
>
>  
>
Aside from the issue of complexity, there are also other reasons not to 
transform original statements.  Digital signatures lose their meaning if 
the data they refer to is transformed.  But there are (at least) two 
ways of looking at solving the need for non-transformation of data.

First you can query down to the original data through an RDF proxy of 
some sort that transforms the view of the original data into a form that 
is needed by the query processor.

Second you can transport the original content as an attachment along 
with the transformed data.  The transformation can then be rechecked at 
any time without affecting the original data, or requiring the use of a 
proxy (thus providing for consistent indexing and performance 
characteristics across all data sources.) 

The first type is what I was trying to get at when I proposed adding 
dynamic data sources to the Simile technologies document.  I think that 
the second more closely maps to the ingest/publish process that we've 
discussed previously. Within the context of the second approach, I'm not 
sure it is important to RDF programmers whether we use quads or 
four-statement reification, since they won't see either of those 
expressions of their data themselves.

Cheers,
-kls

-- 
========================================================
   Kevin Smathers                kevin.smathers@hp.com    
   Hewlett-Packard               kevin@ank.com            
   Palo Alto Research Lab                                 
   1501 Page Mill Rd.            650-857-4477 work        
   M/S 1135                      650-852-8186 fax         
   Palo Alto, CA 94304           510-247-1031 home        
========================================================
use "Standard::Disclaimer";
carp("This message was printed on 100% recycled bits.");

Received on Monday, 30 June 2003 14:46:22 UTC