Reification
(p1)
The most general meaning of reification is simply to 'make into an object',
'thingify'. 'model' or 'have in the domain of discourse', all of which can be
regarded as synonymous. In this section, we'll see why RDF uses reification and
see what reification allows us to say. As reification is a part of RDF that has
caused some bewilderment in the past, we'll walk through a simple example and
contrast it with two other techniques: quotation and reification as it is used
in first order logic, both of which can be confounded with RDF
reification. Nonetheless, we'll see shortly that reification in RDF is a
straightforward and easily understandable matter. In its essence reification
lets us do two things with statements:
(p2)
Make a statement the subject of another statement (statements about
statements).
Make a statement the object of another statement (statements with
statements).
Statements are not resources
(p3)
"But anything can be a resource!" Very true. To be more precise, when we use
RDF, the statements we make are about resources, and we do this via the URIs
which denote them. A statement on its own is just a sentence in RDF (or a
sequence of tokens); it is not available to RDF as a
resource. Nonetheless, sometimes we will want to say something about another RDF
statement or use one as the object of a statement.
(p4)
Suppose we wanted to build a search engine that harvested RDF statements from
the web. What we would like to have be is an index of where our harvested
statements came from along with the time the statement was harvested, all in RDF
as well, and in the same data-store. To do this we'd want to be able to
attribute statements to others and that implies statements would need to be part
of other statements. We decide to make a property called 'gathered' to mean that
a statement was taken from a web-site; here the harvested statement would need
to be the subject of another statement. We could also have another property
called 'published' to mean that a site published a certain statement; here the
harvested statement will be the object of another statement. For example if we
harvested this statement on w3.org:
<rdfprimer> <editor> "eric".
we'd like to be able to do something on the lines of:
<w3.org> <published> [<rdfprimer> <editor> "eric"].
[<rdfprimer> <editor> "eric"] <gathered> <w3.org>.
(p5)
The square brackets are a visual clue that tells us that the statement needs to
be encapsulated and referred to in some way; remember we can only put three
things in an RDF statement, not five!
(p6)
How would we go about doing this? Well we know that the subject of a RDF
statement is a resource denoted by a URI or an anonymous resource, and we know
that these can also be the object of an RDF statement. So the thing to do would
be associate the statement with a URI or an anonymous resource and use that in
place of the statement directly. Here we'll use an anonymous resource, called
'stmt'. That would give us:
<w3.org> <published> _:stmt.
_:stmt <gathered> <w3.org>.
How reification works
(p7)
Reification allows us to denote a statement with a URI or anonymous
resource. This thing we use to denote the statement (in our example the
anonymous resource) is usually called the reified statement. Now how do
we associate the statement with stmt? To let us do this, RDF provides four
properties, each of which will be used as a property of the reified statement
(stmt). These are:
(p8)
- subject: the subject property identifies the resource the statement
is about (the subject slot). Its value in our example is <rdfprimer>.
- predicate: the predicate property identifies the property in the
statement. Its value in our example is <editor>.
- object: The object property identifies the statement's property
value. Its value in our example is "eric".
- type: the type property describes the type of the new resource. You
can consider 'new resource' a figure of speech if you prefer; not everyone is
comfortable with the idea that resources are created or come into existence by
making up URIs or anonymous resources. All reified statements have a type
property whose value is the RDF defined resource Statement.
(p9)
Armed with these properties, we can reify our statement as follows:
_:stmt <type> <Statement>.
_:stmt <subject> <rdfprimer>.
_:stmt <predicate> <editor>
_:stmt <object> "eric"
(p10)
The result, these four new triples, is called the reification of the
statement, to distinguish it from the reified statement,
stmt.
(p11)
Our data store now looks like this:
<rdfprimer> <editor> "eric".
<w3.org> <published> _:stmt.
_:stmt <gathered> <w3.org>.
_:stmt <type> <Statement>.
_:stmt <subject> <rdfprimer>.
_:stmt <predicate> <editor>
_:stmt <object> "eric"
(p12)
There are a few things worth noting at this point.
(p13)
The first is both the statement and its reification are present in our data
store. This is perfectly fine; a statement and its reification are different
things. In the section on quotation we'll talk about this some more.
(p14)
The second is that we have generated quite a few triples! Indeed, it has been
pointed out that RDF reification is a process with verbose results: for example
in our data store a million harvested statements would result in seven million
statements, four million of which are used for reifications! As you can see the
bloat incurred is four statements involved in the reification to every one being
reified. One way of looking at this is to say it's a small price to pay to stay
inside RDF and avoid using a meta-language to talk about
statements, or invent new kinds of statements with four or more parts and break
the simplicity and homogeneity of RDF statements. Statement reification is one
reason why RDF has been called 'self-describing'. Another way of looking at this
is that software doesn't have to hold every triple generated for a reification
in memory, it just to has to act that way, and there numerous optimizations that
can be made to keep the overhead low: in this sense we trade conceptually
simplicity for possible complexity in software.
(p15)
The third is that each statement of a reification is a bona-fide statement in
its own right. In the data store above the four statements used to make a
reification are not in any way special, or different from, the three
others which are not. This goes hand in hand with the notion of describing RDF
statements with other RDF statements.
Reification and quotation
(p16)
Sometimes we want to be able to make a statement, that someone or something,
said or asserted another statement, without actually asserting anything about
the resource in the other statement ourselves. This can be called
mentioning the resource.
(p17)
We've seen something like this already in our data store:
<w3.org> <published> _:stmt.
(p18)
One way of thinking about the above statement is that we are only saying that
w3.org published <rdfprimer> <editor> "eric", we're
not saying anything directly about the resource denoted by
<rdfprimer>. We only say something about
<rdfprimer> when we we put:
<rdfprimer> <editor> "eric"
directly into our RDF store, since anyone could come along and find that
statement without ever dealing with the reified statement, stmt, or necessarily
knowing where the statement came from. This can be called using the
resource.
(p19)
Suppose we decide that we don't want to assert the statements we harvest
directly anymore: things are good, the data-store is very popular, but we've been
naive and somebody might sue us because the contents of the datastore
effectively parrot what someone else said. And who knows, can we really trust
w3.org? Using the previous example we decide to remove all
3rd party statements we harvested. Our data store now looks like this:
<w3.org> <published> _:stmt.
_:stmt <gathered> <w3.org>.
_:stmt <type> <Statement>.
_:stmt <subject> <rdfprimer>.
_:stmt <predicate> <editor>
_:stmt <object> "eric"
(p20)
Now we can be sure that we are only mentioning the resource
<rdfprimer>, and there's no fear of the datastore asserting the
same statements as w3.org.
(p21)
The distinction between use and mention is an important and
valuable one and the terms are often used in a technical sense in logic and
knowledge representation, similar to the characterizations given here. Having
the facility to both use and mention information can avoid no end of
trouble. One popular technique used for mentioning something is to use quotation
marks and the process of mentioning itself is sometime called
quotation.
(p22)
While using statements as the objects of other statements is not precisely
quotation, the understanding that RDF's ability to reify any statement can be
used as something very close to it, leads us to the notion of
attribution of information on the web. The need to be able to attribute
a statement or set of statements by mention rather than use in RDF on the open
Internet will be a common one; effectively it provides the web equivalent of the
maxim 'don't believe everything you read'.
Reification in Predicate Logic
(p23)
It's worth spending a moment distinguishing reification in RDF from a form of
reification sometimes practiced when using predicate logic for knowledge
representation. Being types of reification, the two are broadly
similar. Nonetheless they work differently and on different things, and are not
be confused with each other. To see what this means in logical reification,
suppose we had:
editor(eric, rdfprimer)
(p24)
Here editor is a logical predicate, and the proposition is that eric is the
editor of rdfprimer, where eric and rdfprimer are objects, or things in our
'domain of discourse' (a proposition is just a logical sentence). Another way of
looking at this is to see that editor relates eric and rdfprimer, so
sometimes predicates are called relations. If we wanted to say that eric is
effective as an editor, we'd instinctively try to do this:
effective(editor, eric)
(p25)
But doing so is illegal in predicate logic, since relations are not objects. The
purpose of logical reification, as we'll call it, is not to be able to make a
proposition about another proposition, it's be able to use predicates as the
objects of propositions as above. Logical reification enables this by
representing relations as objects. So we might represent our propositions as
instead: relation(editor, eric, rdfprimer). This prevents us from
writing editor(eric, rdfprimer) but does allow us to write
relation(effective, editor, eric). Using certain rules of
inference, would under the right circumstances allow us to infer that a
relation called editor between eric and rdfprimer holds and we'd write
this as holds(editor, eric, rdfprimer).
(p26)
More technically, logical reification allows one to quantify over relations
(predicates) and stay inside first order logic. RDF reification allows one to
quantify over expressions (statements) and stay inside RDF.