provenance use case and requirements

Below, Jos supplied two examples of provenance using cwm's
log:includes. cwm directives (queries, rules) are expressed in terms
of triples similar to RDF triples. cwm has an additional node type, a
formula, which allows one to make assertions about a group of
statements. The current RDF specifications provide a mechanism for
making assertions about individual statements, reification [1], but it
doesn't seem to be used, at least in query.

There are some issues with using reification to make assertions about
statements (for example, the Superman problem [2], or writing
unordered collections of reified statements). In spite of these
problems, I bet the major reason it isn't used is that it is a pain to
write out. cwm and Euler don't have that problem because formulas are
much easier to write than collections of reified statements.

The RDF app most familiar to me is Annotea. It uses source provenance
for data management and sanity checking. For instance, if someone
wants to delete the statements from some (virtual) document, they
delete all the statements with that "Attribution". Onother query is
that it looks for resources where the same individual has said thay
have type Annotation and annodate a particular resource.

FOAF is another application that pays attention to who said what [3].
Again it needs to know the origin of each statement. Because these
queries are easy to express in cwm, it has some relevent test cases.
Expressed in fairy tale format, consider the following query case:

  Joe is using a DAWG-QL application to write his checks. He does this
  by merging documents from his credit card bank, his calendar, and some
  collaborative scheduling pages maintained by his coworkers. The credit
  card bank is the only one allowed to provide the amount and recipient
  of the checks. The other documents provide other ledger information,
  some of which is on the checks in the memo field.

I have attached an IRC log between AndyS, DaveB and myself discussing
whether and how provenance should be queried or constrained in BRQL.

On Mon, Jul 26, 2004 at 12:25:26AM +0200, Jos De_Roo wrote:
> 
> For an explanation of log:semantics, log:includes and log:notIncludes
> I would like to point to http://www.w3.org/2000/10/swap/doc/Reach
> 
> Now let's assume that
> 
> <a.n3> a q:Source.
> <b.n3> a q:Source.
> 
> and a.n3 is
> 
> :foo :a "a".
> :foo :b "b".
> 
> and b.n3 is
> 
> :bar :a "a".
> 
> Then the query
> 
> [] q:select { (?O ?SRC) };
>    q:where {?SRC a q:Source. ?SRC.log:semantics log:includes {?S ?P ?O}}.
> 
> results in
> 
> ("a" <file:/temp/a.n3>) .
> ("b" <file:/temp/a.n3>) .
> ("a" <file:/temp/b.n3>) .
> 
> as a matter of test case.
> 
> 
> Another test case is that the query
> 
> @prefix log: <http://www.w3.org/2000/10/swap/log#>.
> @prefix q: <http://www.w3.org/2004/ql#>.
> @prefix x: <http://example.com/exon/#>.
> [] q:select { (?E) };
>    q:where { <http://www.w3.org/2000/10/swap/test/EricNeumann/exdata.n3> 
> log:semantics ?F.
>              ?F log:includes { ?T1 a x:Transcript; x:hasExon ?E. ?T2 a 
> x:Transcript }.
>              ?F log:notIncludes { ?T2 x:hasExon ?E }}.
> 
> results in
> 
> (<http://www.w3.org/2000/10/swap/test/EricNeumann/exdata.n3#ATP1B4_e3>) .
> (<http://www.w3.org/2000/10/swap/test/EricNeumann/exdata.n3#ATP1B4_e2>) .
> 
> 
> -- 
> Jos De Roo, AGFA http://www.agfa.com/w3c/jdroo/

[1] http://www.w3.org/TR/rdf-syntax-grammar/#section-Reification
[2] http://www.w3.org/2001/12/attributions/#superman
[3] http://www-106.ibm.com/developerworks/xml/library/x-foaf2.html#N10163
-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741 (does not work in Asia)

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Saturday, 14 August 2004 00:17:07 UTC