How/Why to implement Reification efficiently... from Gabe Beged-Dov on 2000-11-29 (www-rdf-interest@w3.org from November 2000)

From: Gabe Beged-Dov <begeddov@jfinity.com>
Date: Wed, 29 Nov 2000 14:18:01 -0800
To: Stefan Kokkelink <skokkeli@mathematik.uni-osnabrueck.de>
CC: "www-rdf-interest@w3.org" <www-rdf-interest@w3.org>
Message-ID: <3A258099.DDFA3A23@jfinity.com>
Stefan Kokkelink wrote:
> 
> Gabe Beged-Dov wrote:

<snip />

> > As you say, I am proposing that we assume that a conformant parser
> > must generate the bags and reified statements. Once we take that step
> > we can then discuss how to provide straightforward and efficient API
> > and implementations based on a standard interpretation.
> 
> I disagree here. In general there is no need to know
> about the XML structure since the XML serialization
> is meant for exchanging RDF models (at least that is
> my point of view ;-).  If you look at the examples of
> M&S you won't find an RDF graph containing a reification
> or bagification unless bagID or propertyID are explicitly
> given. In my opinion a parser SHOULD provide a configuration
> setting that enforces a bagification for every rdf:Description
> element (if someone really is interested in the XML structure
> of the serialization...)

I am trying to achieve multiple goals with this interpretation of the
M&S. The goals are:

- To have a single consistent interpretation of what an 
  RDF processor generates
- To bring in-band the various types of information that 
  implementations (especially storage) are handling out-of-band
- To not lose any information that is contained in the
  source representation
- To be able to trace back statings to their occurrence
  (and also quotings although that's less clear). 
- To push a web document centric view of RDF
- To allow an entire RDF document set to be manipulated
  directly as a single graph

I distinguish between the ability to surface the raw triples that
occurred in the source document and the ability to track all of the
information contained in the source document. This is similar to the
XML infoset and even more so to HyTime Graves (the full information)
and Grove Plans (a filtered view of that information.

Here's a thought experiment. You have a streaming pipeline like this:

source_doc -> normalizer -> infoset_gen -> MyStatement_gen ->
triple_gen

The normalizer takes in the various syntactic variations and outputs
an equivalent version in the basic syntax of the M&S. 

The infoset_gen takes this basic syntax and adds full reification
labelling and any other necessary metadata. It emits this version of
the source document as RDF/XML basic syntax. 

The MyStatement_gen generates an efficient high level API version of
this annotated version of the source document. This is discussed more
below. 

Finally, the triple_gen is a filter that gives you an expansion into
triples of some subset of the statement stream that was emited by the
MyStatement_gen module. 

If you assume my interpretation of what information needs to be
generated from a source document, the following MyStatement structure
would convey that information:

{ BagID, StatementID, subject, predicate, object, isStated }

If RDF processors generated this sextuple rather than the current
triple, they would generate no more "statements" than in the current
usage. In fact they would generate alot less in the face of explicit
or requested reification. They would have all the information
necessary to generate the "legacy" triple interface if an application
wanted the raw view of the information. You could also control the
triple filter to only return ground triples, etc.. 

I am working on an implementation based on these ideas that leverages
the SAX2 filter architecture. It is based on the David Megginson's
RDFFilter. I hope to have something to share in the near future. If
anyone would like to collaborate on this, I would love help. It is
currently visible as an empty project at sourceforge called RaPFiSH
(RDF; A Parser Framework implemented using SAX2 Handlers :). Send me
e-mail if you are interested.

Just to clarify, I know that there are alot of RDF Frameworks out
there already, but they mostly focus on issues downstream from the
parser and take the triple is king view of RDF processing. I am
focusing on the upstream portion of the pipeline and on this different
approach of passing sextuples in the API between the parser and
application. I plan to be able to plug into existing frameworks using
triple based API like those that Sergey and others have developed. 
 
> All the best,
> Stefan

Gabe

-- 
--------------------------- 
http://www.jfinity.com/gabe
Received on Wednesday, 29 November 2000 16:17:50 UTC