4-triple reification considered harmful from Sergey Melnik on 2000-05-30 (www-rdf-interest@w3.org from May 2000)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Mon, 29 May 2000 21:26:06 -0700
To: RDF interest group <www-rdf-interest@w3.org>
Message-ID: <393342DE.C0C7256@db.stanford.edu>

The current spec suggests the "4 triple" form of reification. It may
seem that there is a need for that on the modeling level; I believe
there is none. Reification can easily be made "model inherent". For
that, one can assume that there exist a Skolem function (sorry, no
reference for this buzzword) that maps R x R x (R u L) to R. That's it.
At the programming level reification is similarly lightweight. An
instance of Statement is a Resource, and can be used in other
statements.

So maybe there is a need for this "verbose" form in the syntax?

I believe there is none either. One can develop a compact
reification-friendly syntax, in which syntactic constructs like
temporary IDs etc. can be used to give hints to the parser, which
statements are subjects of other statements.

Why the "verbose" reification is in the spec, is mistery for me. Hardly
used by anyone, it is still harmful, because it confuses a lot of
people.

Still, there is a case that needs more work and this is precisely what
has recently been discussed on the list: statements about sets of
statements. At the logical level, this can be solved as a Skolem
function that maps Pow(R x R x (R u L)) to R. Programming remains much
the same, since an instance of a Model is a Resource. The
reification-friendly syntax can gracefully handle models, too (e.g.
using nesting in XML).

But: sometimes it is impractical to reify a large model to make
statements about it. Consider the dmoz dump. You don't want your quote
to be 0.5 GB to say that the model contains adult material (note that if
your statement is about the URL, it may become false if dmoz filters out
adult sites one day).

Or consider a B2B application in which the partners talk about large,
constantly changing datasets like product catalogs. Everybody has local
access to all relevant data. Now one of the partners wants to assert
that the pricing info in the catalog has been verified and that the data
is ready to be published on the procurement site. Can the responsible
person risk not talking about the content? The legal department would
kill him. Can he routinely quote gigabytes of data in his signed
electronic documents? You guess the answer.

This is where cryptographic digests of the content (that includes
prices, terms and conditions etc.) become indispensable. Note that the
problem is so closely related to reification that it may make sense to
try looking at both from the same perspective. From this viewpoint,
cryptographically strong hashes are nothing but good approximations for
the above mentioned Skolem functions. In fact, the first Skolem function
(sk1, that maps statements to resources) can be designed so that it is
subsumed by the second one (sk2, maps sets of statements to resources).
Just define sk1 as:

for all s in (R x R x (R u L)): sk1(s) = sk2({s})

Sergey

Received on Tuesday, 30 May 2000 00:14:25 UTC