Re: Provenance as a first-class citizen from Frank Manola on 2006-03-21 (semantic-web@w3.org from March 2006)

From: Frank Manola <fmanola@acm.org>
Date: Tue, 21 Mar 2006 10:32:15 -0500
To: Ian Emmons <iemmons@bbn.com>
CC: semantic-web@w3.org
Message-ID: <44201C7F.7090903@acm.org>
Ian Emmons wrote:
> 
snip
> 
> Here's a reification question for the larger group:  This thread has 
> repeatedly asserted that the fourth URI in a quad (or 4-tuple) "solves" 
> the reification problem.  However, as far as I know, there is no general 
> agreement on what that fourth URI should represent.  Does it identify 
> the source of the statement?  Does it identify the statement itself?  Or 
> are there other options?  It seems to me that if quads were made part of 
> the RDF standard (whether optional or not), then the standard should 
> specify what the fourth URI is.
> 

Hear! hear!  It seems to me that quads are mainly an implementation 
mechanism, and there's a need to agree on the details of "for what?" and 
"how?" they will be used.

I haven't thought about this much in a while, but my own (vague) view is 
that a quotation mechanism would be a straightforward approach to at 
least some of the problems being discussed (and I know Sandro has done 
some work on this).  One of the issues with RDF reification as it exists 
is that it describes what the parts of a statement talk about, rather 
than describing the parts of the statement itself.  For example,

ex:stmt1 rdf:type rdf:Statement .
ex:stmt1 rdf:subject ex:mary .
ex:stmt1 rdf:predicate ex:age .
ex:stmt1 rdf:object "23" .
ex:mary  rdf:type ex:Person .

(informally) says that ex:stmt1 gives some information about a person 
denoted by ex:mary, but it doesn't necessarily say that the URI in the 
"subject" position of the statement that was written is literally the 
expansion of ex:mary into a URI (see Section 3.3.1 of RDF Semantics); 
it may have been some other URI that denotes the same person

In doing many kinds of provenance work, on the other hand, it seems to 
me you want to describe what was actually written, not just what that 
statement talks about.  That is, I'd like to be able to say something like

'ex:mary ex:age "23" ' ex:writtenBy ex:joe .

meaning that Joe wrote that exact triple.  As in English, 'ex:mary 
ex:age "23" ' becomes the "name" of that statement.  Given this, you can 
add arbitrary amounts of other information to the description of the 
statement, such as the document in which the statement was made.  This 
models the situation in English. For example, if I want to say more than 
just

Mark Twain once said "If we had less statesmanship we could get along 
with fewer battleships"

and site a source as well, I would need to add a separate piece of 
information to identify the reference, as in

Mark Twain once said "If we had less statesmanship we could get along 
with fewer battleships" [1].

For the statement about Mary, you might say something like:

'ex:mary ex:age "23" ' ex:writtenBy ex:joe .
'ex:mary ex:age "23" ' ex:locatedIn ex:document2135 .

Obviously, some other things would have to be worked out too, such as 
the details of "unquoting", and the URIification of quoted statements, 
and you'd probably want to be able to "parse" quoted statements to get 
at the actual URIs of the subject, predicate, and object parts.  KIF 
illustrates these features (in a more complicated language than RDF).

Mind you, this doesn't mean that you wouldn't also want something like 
named graphs (a whole graph is a different thing, and it's still useful 
to be able to identify it).  It also doesn't mean you might not want to 
use quads in an implementation (of something).

--Frank
Received on Tuesday, 21 March 2006 15:28:33 UTC