PrimerReificationSection

Bill de hÓra
$Revision: 1.6 $
Overview:
Draft section on reification for the RDF primer.
Comments sought; in particular on the example statements used, the comparison with logical reification and quotation, and the overall tone and structure of the section.
(@@@ for clarity, we'll use words instead of URI references inside angle brackets in the examples; to be fleshed out later)
(@@@ todo graph graphic to go with each statement grouping)

Reification
(p1) The most general meaning of reification is simply to 'make into an object', 'thingify'. 'model' or 'have in the domain of discourse', all of which can be regarded as synonymous. In this section, we'll see why RDF uses reification and see what reification allows us to say. As reification is a part of RDF that has caused some bewilderment in the past, we'll walk through a simple example and contrast it with two other techniques: quotation and reification as it is used in first order logic, both of which can be confounded with RDF reification. Nonetheless, we'll see shortly that reification in RDF is a straightforward and easily understandable matter. In its essence reification lets us do two things with statements:
(p2)
Make a statement the subject of another statement (statements about statements).
Make a statement the object of another statement (statements with statements).

Statements are not resources
(p3) "But anything can be a resource!" Very true. To be more precise, when we use RDF, the statements we make are about resources, and we do this via the URIs which denote them. A statement on its own is just a sentence in RDF (or a sequence of tokens); it is not available to RDF as a resource. Nonetheless, sometimes we will want to say something about another RDF statement or use one as the object of a statement.
(p4) Suppose we wanted to build a search engine that harvested RDF statements from the web. What we would like to have be is an index of where our harvested statements came from along with the time the statement was harvested, all in RDF as well, and in the same data-store. To do this we'd want to be able to attribute statements to others and that implies statements would need to be part of other statements. We decide to make a property called 'gathered' to mean that a statement was taken from a web-site; here the harvested statement would need to be the subject of another statement. We could also have another property called 'published' to mean that a site published a certain statement; here the harvested statement will be the object of another statement. For example if we harvested this statement on w3.org:
   <rdfprimer>   <editor>   "eric".
we'd like to be able to do something on the lines of:
   <w3.org>                       <published>  [<rdfprimer> <editor> "eric"].
   [<rdfprimer> <editor> "eric"]  <gathered>   <w3.org>.
(p5) The square brackets are a visual clue that tells us that the statement needs to be encapsulated and referred to in some way; remember we can only put three things in an RDF statement, not five!
(p6) How would we go about doing this? Well we know that the subject of a RDF statement is a resource denoted by a URI or an anonymous resource, and we know that these can also be the object of an RDF statement. So the thing to do would be associate the statement with a URI or an anonymous resource and use that in place of the statement directly. Here we'll use an anonymous resource, called 'stmt'. That would give us:
   <w3.org>     <published>   _:stmt.
   _:stmt       <gathered>    <w3.org>.  
How reification works
(p7) Reification allows us to denote a statement with a URI or anonymous resource. This thing we use to denote the statement (in our example the anonymous resource) is usually called the reified statement. Now how do we associate the statement with stmt? To let us do this, RDF provides four properties, each of which will be used as a property of the reified statement (stmt). These are:
(p8)
subject: the subject property identifies the resource the statement is about (the subject slot). Its value in our example is <rdfprimer>.

predicate: the predicate property identifies the property in the statement. Its value in our example is <editor>.

object: The object property identifies the statement's property value. Its value in our example is "eric".

type: the type property describes the type of the new resource. You can consider 'new resource' a figure of speech if you prefer; not everyone is comfortable with the idea that resources are created or come into existence by making up URIs or anonymous resources. All reified statements have a type property whose value is the RDF defined resource Statement.
(p9) Armed with these properties, we can reify our statement as follows:
   _:stmt    <type>       <Statement>.
   _:stmt    <subject>    <rdfprimer>.
   _:stmt    <predicate>  <editor>
   _:stmt    <object>     "eric"   
(p10) The result, these four new triples, is called the reification of the statement, to distinguish it from the reified statement, stmt.
(p11) Our data store now looks like this:
   <rdfprimer>   <editor>       "eric".
   <w3.org>      <published>    _:stmt.
   _:stmt        <gathered>     <w3.org>.  
   _:stmt        <type>         <Statement>.
   _:stmt        <subject>      <rdfprimer>.
   _:stmt        <predicate>    <editor>
   _:stmt        <object>       "eric"   
(p12) There are a few things worth noting at this point.
(p13) The first is both the statement and its reification are present in our data store. This is perfectly fine; a statement and its reification are different things. In the section on quotation we'll talk about this some more.
(p14) The second is that we have generated quite a few triples! Indeed, it has been pointed out that RDF reification is a process with verbose results: for example in our data store a million harvested statements would result in seven million statements, four million of which are used for reifications! As you can see the bloat incurred is four statements involved in the reification to every one being reified. One way of looking at this is to say it's a small price to pay to stay inside RDF and avoid using a meta-language to talk about statements, or invent new kinds of statements with four or more parts and break the simplicity and homogeneity of RDF statements. Statement reification is one reason why RDF has been called 'self-describing'. Another way of looking at this is that software doesn't have to hold every triple generated for a reification in memory, it just to has to act that way, and there numerous optimizations that can be made to keep the overhead low: in this sense we trade conceptually simplicity for possible complexity in software.
(p15) The third is that each statement of a reification is a bona-fide statement in its own right. In the data store above the four statements used to make a reification are not in any way special, or different from, the three others which are not. This goes hand in hand with the notion of describing RDF statements with other RDF statements.

Reification and quotation
(p16) Sometimes we want to be able to make a statement, that someone or something, said or asserted another statement, without actually asserting anything about the resource in the other statement ourselves. This can be called mentioning the resource.
(p17) We've seen something like this already in our data store:
   <w3.org>   <published>    _:stmt.
(p18) One way of thinking about the above statement is that we are only saying that w3.org published <rdfprimer> <editor> "eric", we're not saying anything directly about the resource denoted by <rdfprimer>. We only say something about <rdfprimer> when we we put:
   <rdfprimer>   <editor>   "eric"
directly into our RDF store, since anyone could come along and find that statement without ever dealing with the reified statement, stmt, or necessarily knowing where the statement came from. This can be called using the resource.
(p19) Suppose we decide that we don't want to assert the statements we harvest directly anymore: things are good, the data-store is very popular, but we've been naive and somebody might sue us because the contents of the datastore effectively parrot what someone else said. And who knows, can we really trust w3.org? Using the previous example we decide to remove all 3rd party statements we harvested. Our data store now looks like this:
   <w3.org>      <published>    _:stmt.
   _:stmt        <gathered>     <w3.org>.  
   _:stmt        <type>         <Statement>.
   _:stmt        <subject>      <rdfprimer>.
   _:stmt        <predicate>    <editor>
   _:stmt        <object>       "eric"   
(p20) Now we can be sure that we are only mentioning the resource <rdfprimer>, and there's no fear of the datastore asserting the same statements as w3.org.
(p21) The distinction between use and mention is an important and valuable one and the terms are often used in a technical sense in logic and knowledge representation, similar to the characterizations given here. Having the facility to both use and mention information can avoid no end of trouble. One popular technique used for mentioning something is to use quotation marks and the process of mentioning itself is sometime called quotation.
(p22) While using statements as the objects of other statements is not precisely quotation, the understanding that RDF's ability to reify any statement can be used as something very close to it, leads us to the notion of attribution of information on the web. The need to be able to attribute a statement or set of statements by mention rather than use in RDF on the open Internet will be a common one; effectively it provides the web equivalent of the maxim 'don't believe everything you read'.

Reification in Predicate Logic
(p23) It's worth spending a moment distinguishing reification in RDF from a form of reification sometimes practiced when using predicate logic for knowledge representation. Being types of reification, the two are broadly similar. Nonetheless they work differently and on different things, and are not be confused with each other. To see what this means in logical reification, suppose we had:
   editor(eric, rdfprimer)
(p24) Here editor is a logical predicate, and the proposition is that eric is the editor of rdfprimer, where eric and rdfprimer are objects, or things in our 'domain of discourse' (a proposition is just a logical sentence). Another way of looking at this is to see that editor relates eric and rdfprimer, so sometimes predicates are called relations. If we wanted to say that eric is effective as an editor, we'd instinctively try to do this:
   effective(editor, eric)
(p25) But doing so is illegal in predicate logic, since relations are not objects. The purpose of logical reification, as we'll call it, is not to be able to make a proposition about another proposition, it's be able to use predicates as the objects of propositions as above. Logical reification enables this by representing relations as objects. So we might represent our propositions as instead: relation(editor, eric, rdfprimer). This prevents us from writing editor(eric, rdfprimer) but does allow us to write relation(effective, editor, eric). Using certain rules of inference, would under the right circumstances allow us to infer that a relation called editor between eric and rdfprimer holds and we'd write this as holds(editor, eric, rdfprimer).
(p26) More technically, logical reification allows one to quantify over relations (predicates) and stay inside first order logic. RDF reification allows one to quantify over expressions (statements) and stay inside RDF.

$Id: PrimerReificationSection.html,v 1.6 2001/10/22 23:53:45 dehora Exp $
$Revision: 1.6 $
$Date: 2001/10/22 23:53:45 $
$Author: dehora $