Dave Reynolds wrote a very nice summarization of the current state of
and "contexts" in RDF (repeated below). I'd like to add a few observations.
We badly need the "set of statements" notion in much of our work using RDF. Putting
statements into an RDF list or bag is NOT the solution (spacewise, its
a big loser). Using reified statements is the current fallback position, and
its also a loser on space (but normally not as bad as using lists/bags of statements).
A "model" represents a set of statements, but queries that span models
are currently not supported, so that does not represent a way out (yet).
The Jena "reifiedOnly" bit is essentially useless. You can't do matches
against statements that aren't added to a model, so we ALWAYS add
statements to our models, and hence the "reifiedOnly" bit is always set
to false, both for reified and non-reified statements. If that bit could be
redefined to mean "is true in this model", that would be a step in the
right direction. Alternatively, if we had a bit meaning "this statement
is reified", that would also be useful (for different reasons).
Contexts are not new or exotic. Loom, Cyc, EpiKit, and PowerLoom have been
using them for years. In all of these systems, contexts are arranged in
a hierarchy, and individual statements can be true in some contexts, false or
unknown in others. Loom and PowerLoom implement a very efficient context
mechanism (derived from Edinburgh's OPLAN system) that adds an almost
invisible overhead to the rest of the processing overhead. That overhead is
probably not quite so small when managing contexts on secondary storage,
but its not a killer. Instead of adding a bit saying whether a statement is true
in a model (as I naively suggested above), what you really need is a data structure
that tells you which models/contexts a statement is true in (a model and a context
are the same thing from a logic perspective).
Finally, models/contexts should be first class entities, i.e., they should be resources.
That way, you can make statements about all statements in a context. That's
where the space saving comes from -- if you have 500 statements that all have the
same last-modified timestamp and the same author, you can put them into a context that has a
single timestamp statement and a single author statement. With reification, you
need 500 timestamp statements and 500 author statements.
We have partially worked out a scheme where we could use namespaces
to encode context information. This would be an enormous kludge, and would run
contrary to the spirit of what namespaces are for, but it would solve the space
problem just alluded to. We would prefer that someone fearlessly step forward and offer
up a system that supports contexts in a way that does not violate the current RDF
religion. Sometimes you need to ignore the theoreticians and just build stuff.
Implementationally (and semantically), contexts aren't really all that difficult.
At 09:32 AM 11/21/2002 +0000, Dave Reynolds wrote:
The only concept in RDF itself which allows you to identify where a statement
came from is reification.
There is no notion of a "set of statements" as a first class thing in RDF. Jena
Models are convenient ways to work with sets of statements but are not
themselves RDF entities - you can't have one Jena Model "inside" another and if
you could there would be no way easy to write out such a structure in RDF/XML.
You can use reification to represent the provenance of statements in a Jena
model. You do have to do this manually - iterate over the set and create the
explicit reification yourself - there is no "addSetToModelWithProvenance"
The difficulty with this is that it generates a lot of triples. Jena does
attempt to help here by providing a shortcut way of representing reification
(Statements can be treated as Resources and have properties attached to them).
This is convenient but not quite correct given the working group interpretations
and doesn't interact well with the RDF/XML reader/writers. This will be sorted