Re: provenance use case and requirements

On Fri, Aug 13, 2004 at 08:17:06PM -0400, Eric Prud'hommeaux wrote:
> Below, Jos supplied two examples of provenance using cwm's
> log:includes. cwm directives (queries, rules) are expressed in terms
> of triples similar to RDF triples. cwm has an additional node type, a
> formula, which allows one to make assertions about a group of
> statements. The current RDF specifications provide a mechanism for
> making assertions about individual statements, reification [1], but it
> doesn't seem to be used, at least in query.
> 
> There are some issues with using reification to make assertions about
> statements (for example, the Superman problem [2], or writing
> unordered collections of reified statements). In spite of these
> problems, I bet the major reason it isn't used is that it is a pain to
> write out. cwm and Euler don't have that problem because formulas are
> much easier to write than collections of reified statements.
> 
> The RDF app most familiar to me is Annotea. It uses source provenance
> for data management and sanity checking. For instance, if someone
> wants to delete the statements from some (virtual) document, they
> delete all the statements with that "Attribution". Onother query is
> that it looks for resources where the same individual has said thay
> have type Annotation and annodate a particular resource.
> 
> FOAF is another application that pays attention to who said what [3].
> Again it needs to know the origin of each statement. Because these
> queries are easy to express in cwm, it has some relevent test cases.

Oops, I intended to poll others and ask what apps they were supporting
that involved provenance or worked around it in clever ways. Also, how
widely should I ask this question? rdfig?

> Expressed in fairy tale format, consider the following query case:
> 
>   Joe is using a DAWG-QL application to write his checks. He does this
>   by merging documents from his credit card bank, his calendar, and some
>   collaborative scheduling pages maintained by his coworkers. The credit
>   card bank is the only one allowed to provide the amount and recipient
>   of the checks. The other documents provide other ledger information,
>   some of which is on the checks in the memo field.
> 
> I have attached an IRC log between AndyS, DaveB and myself discussing
> whether and how provenance should be queried or constrained in BRQL.
> 
> On Mon, Jul 26, 2004 at 12:25:26AM +0200, Jos De_Roo wrote:
> > 
> > For an explanation of log:semantics, log:includes and log:notIncludes
> > I would like to point to http://www.w3.org/2000/10/swap/doc/Reach
> > 
> > Now let's assume that
> > 
> > <a.n3> a q:Source.
> > <b.n3> a q:Source.
> > 
> > and a.n3 is
> > 
> > :foo :a "a".
> > :foo :b "b".
> > 
> > and b.n3 is
> > 
> > :bar :a "a".
> > 
> > Then the query
> > 
> > [] q:select { (?O ?SRC) };
> >    q:where {?SRC a q:Source. ?SRC.log:semantics log:includes {?S ?P ?O}}.
> > 
> > results in
> > 
> > ("a" <file:/temp/a.n3>) .
> > ("b" <file:/temp/a.n3>) .
> > ("a" <file:/temp/b.n3>) .
> > 
> > as a matter of test case.
> > 
> > 
> > Another test case is that the query
> > 
> > @prefix log: <http://www.w3.org/2000/10/swap/log#>.
> > @prefix q: <http://www.w3.org/2004/ql#>.
> > @prefix x: <http://example.com/exon/#>.
> > [] q:select { (?E) };
> >    q:where { <http://www.w3.org/2000/10/swap/test/EricNeumann/exdata.n3> 
> > log:semantics ?F.
> >              ?F log:includes { ?T1 a x:Transcript; x:hasExon ?E. ?T2 a 
> > x:Transcript }.
> >              ?F log:notIncludes { ?T2 x:hasExon ?E }}.
> > 
> > results in
> > 
> > (<http://www.w3.org/2000/10/swap/test/EricNeumann/exdata.n3#ATP1B4_e3>) .
> > (<http://www.w3.org/2000/10/swap/test/EricNeumann/exdata.n3#ATP1B4_e2>) .
> > 
> > 
> > -- 
> > Jos De Roo, AGFA http://www.agfa.com/w3c/jdroo/
> 
> [1] http://www.w3.org/TR/rdf-syntax-grammar/#section-Reification
> [2] http://www.w3.org/2001/12/attributions/#superman
> [3] http://www-106.ibm.com/developerworks/xml/library/x-foaf2.html#N10163
> -- 
> -eric
> 
> office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
>                         Shonan Fujisawa Campus, Keio University,
>                         5322 Endo, Fujisawa, Kanagawa 252-8520
>                         JAPAN
>         +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
> cell:   +1.857.222.5741 (does not work in Asia)
> 
> (eric@w3.org)
> Feel free to forward this message to any list for any purpose other than
> email address distribution.

Content-Description: irc://irc.w3.org/%23dawg 2004-08-13 -- provenance in BRQL
> 2004-08-13T16:21:22Z <AndyS> SOURCE is a good one because it isn't clear what "it" is yet and the F2F time is good for that
> 2004-08-13T16:21:45Z <ericP> roger
> 2004-08-13T16:21:53Z <AndyS> email for  a bit (explore) ; meet to get a sense of people's views ; then can "do"
> 2004-08-13T16:22:04Z <AndyS> So some text to light blue touch paper (!)
> 2004-08-13T16:22:11Z <ericP> FOAF and annotea seem like the most documented users of source attribution
> 2004-08-13T16:22:23Z <AndyS> I was idling thinking of trickinesses - 
> 2004-08-13T16:23:17Z <ericP> ...?
> 2004-08-13T16:23:35Z <AndyS> what about { :x :y :z . :x :y :z SRC(?s) . }  How many matches to first (assume several :x :y :z)
> 2004-08-13T16:23:53Z <AndyS> Its the mixing of triple/3 and triple/4 that is interesting
> 2004-08-13T16:24:52Z <ericP> doc1 says :foo is a CashiersCheck
> 2004-08-13T16:25:02Z <ericP> anyone says CashiersCheck is a Check
> 2004-08-13T16:25:21Z <ericP> (though your ex. had the same statement twice, intended?)
> 2004-08-13T16:25:23Z <AndyS> (aside : Sidean aren't W3 members - may be they woudl be interested)
> 2004-08-13T16:26:00Z <ericP> (good point, i'll query Bob)
> 2004-08-13T16:26:02Z <AndyS> Better ex:  { :x :y ?z. :x :y ?z SRC(?s) . }
> 2004-08-13T16:26:45Z  * ericP thinks about it from a relational perspective...
> 2004-08-13T16:27:25Z <AndyS> Several matches to second part : so reverse the order and seem to get more lines in result set 
>                      (duplicates) where there is no ?s in SELECT
> 2004-08-13T16:28:22Z <ericP> assuming ?s unbound:
> 2004-08-13T16:28:22Z <AndyS> Haven't worked it in detail but we are outside RDF and need to pick something that is just query - not 
>                      general provenance which others are interested in a solution for
> 2004-08-13T16:29:03Z <ericP> relvar has a set of ?z bindings. for each, cross with the subset of those bindings where ?z is known
> 2004-08-13T16:29:11Z <AndyS> Thinks ::  SELECT ?z { :x :y ?z. }   vs    SELECT ?z { :x :y ?z SRC(?s). }
> 2004-08-13T16:29:24Z <ericP> what does { :x :y ?z. :x :y ?z } do?
> 2004-08-13T16:29:36Z <AndyS> Why should they be the same?  Why different?
> 2004-08-13T16:29:55Z <AndyS> Your ex: its all the ?z, once.
> 2004-08-13T16:30:48Z <ericP> yeah, seems to be all ?z crossed with all ?z restricted to the set where the ?z=?z
> 2004-08-13T16:30:55Z <AndyS> Syntax aside : I like differentiating the 4th element but just quads is OK.
> 2004-08-13T16:31:10Z <ericP> i prefer diferentiation
> 2004-08-13T16:31:21Z <AndyS> dy/dx
> 2004-08-13T16:32:00Z <ericP> algae used to be quads, but i discovered that sometimes i wanted to talk about other properties of the 
>                      triple, for instance, datatype
> 2004-08-13T16:32:24Z <AndyS> ?datatype  : why not   "1"^^?x
> 2004-08-13T16:32:39Z <ericP> we use datatype in the atom serialization, but i bet we'll think of *something* else later on
> 2004-08-13T16:32:46Z <AndyS> its a property of the slot, not the triple isn't it?
> 2004-08-13T16:32:47Z <ericP> also, i think it helps cognition
> 2004-08-13T16:33:14Z <ericP> yeah, maybe it wasn't dt.
> 2004-08-13T16:33:20Z  * ericP checks...
> 2004-08-13T16:33:27Z <AndyS> May well - is this area very open or are there a few well known approaches?
> 2004-08-13T16:33:58Z <ericP> the source stuff? opne, i think.
> 2004-08-13T16:34:02Z <AndyS> If opne; then email start else text in doc of "normal" approach
> 2004-08-13T16:34:05Z <ericP> don't know about XUL
> 2004-08-13T16:34:49Z <AndyS> 3Store has quads as does Redland and RDFStore (? the last one)
> 2004-08-13T16:35:17Z <AndyS> I think DaveB assignes no particular meaning to 4th slot
> 2004-08-13T16:35:27Z <DaveB> yeah
> 2004-08-13T16:35:37Z <AndyS> Not idle then!
> 2004-08-13T16:35:43Z <DaveB> 3store has 1 meaning, source or something
> 2004-08-13T16:35:50Z <DaveB> busy merging redland win32 patches
> 2004-08-13T16:35:53Z <ericP> daveb, used to be triples in containing sets, no?
> 2004-08-13T16:35:59Z <ericP> (now quads)
> 2004-08-13T16:36:11Z <AndyS> Redland isn't set based is it?
> 2004-08-13T16:36:15Z <DaveB> it is now
> 2004-08-13T16:36:19Z <DaveB> it is sets now
> 2004-08-13T16:36:30Z <DaveB> but with contexts on, you can have dup triples
> 2004-08-13T16:36:34Z <AndyS> sets of triples or sets of quads?
> 2004-08-13T16:36:45Z <DaveB> sets of triples or bags of quads
> 2004-08-13T16:36:47Z <AndyS> Ah - need to turns quads on?
> 2004-08-13T16:36:55Z <DaveB> well, not quite quads
> 2004-08-13T16:37:51Z <DaveB> I wrote about it most recently in http://www.w3.org/2001/sw/Europe/reports/large_scale_demo/
> 2004-08-13T16:39:20Z <DaveB> pub, see ya
> 2004-08-13T16:39:28Z -!- DaveB [dajobe@137.222.34.57] has quit [Quit: Client exiting]
> 2004-08-13T16:39:32Z <ericP> would you impelement CONSTRUCT * WHERE ( ?p rdf:type foaf:Person . ?p foaf:knows ?known )....
> 2004-08-13T16:39:36Z <ericP> rats, missed him
> 2004-08-13T16:40:38Z <ericP> i don't know whether to look for all the statements in the same set, differentiated by the last col, 
>                      or whether i'd have to iterate over the known contexts
> 2004-08-13T16:40:40Z <AndyS> Whats the issue with the Q
> 2004-08-13T16:40:46Z <ericP> former seems more efficient
> 2004-08-13T16:41:08Z <AndyS> There is a protocol matter 
> 2004-08-13T16:41:16Z <AndyS> Well - encoding matter really.
> 2004-08-13T16:41:52Z <AndyS> No syntax for quads : we coudl restrict SRC usage to result sets and in query
> 2004-08-13T16:41:58Z <ericP> re the Q, i was going to further constrain on triple to be stated by a known party (after i knew how 
>                      the general query was executed
> 2004-08-13T16:43:15Z <ericP> "CONSTRUCT *" wasn't meant to complicate, just a boring head to the query
> 2004-08-13T16:43:27Z <ericP> "SELECT *" would have been better
> 2004-08-13T16:43:46Z <AndyS> :-)
> 2004-08-13T16:44:15Z <ericP> what do you think of { ?p rdf:type ?q (SRC(?s) ) }
> 2004-08-13T16:44:21Z <ericP> ie, move it to the constraints?
> 2004-08-13T16:45:00Z <ericP> then you can have syntaxes like { ?p rdf:type ?q (SRC() = <http://trusted.example/foo>) }
> 2004-08-13T16:45:07Z <ericP> for when you want to constrain
> 2004-08-13T16:45:32Z <ericP> maybe former could be { ?p rdf:type ?q (?s = SRC() ) }
> 2004-08-13T16:45:53Z <AndyS> Don't see point of outer () - its no different to wanting { ?x ?y (?<34) }
> 2004-08-13T16:46:18Z <AndyS> Could have SRC(?s) as a binding operation like any other slot
> 2004-08-13T16:46:24Z <ericP> i was tyring to parallel that syntax for the SRC constraints
> 2004-08-13T16:47:10Z <AndyS> SRC applies to triples - can we have SRC applied to graphs? graph patterns?
> 2004-08-13T16:47:16Z <AndyS> Syntax - err ----
> 2004-08-13T16:47:50Z  * ericP digs up an algae test for expressivity comparison...
> 2004-08-13T16:47:51Z <AndyS> SRC(?s,{pattern})  so SRC(?s, { :x :y ?z. } )
> 2004-08-13T16:47:54Z <ericP> 
>                      http://dev.w3.org/cvsweb/perl/modules/W3C/Rdf/test/Ephemoral0-alg.sh?rev=HEAD&content-type=text/x-cvsweb-markup
> 2004-08-13T16:48:01Z <ericP> look for ATTRIB
> 2004-08-13T16:48:07Z <ericP> (how i spell SRC)
> 2004-08-13T16:48:59Z <ericP> SRC(?s,{pattern}) is appealing...
> 2004-08-13T16:50:05Z <AndyS> What other systems to be considered?
> 2004-08-13T16:50:17Z <ericP> algae syntax: ask ?db ( ?ps ?ps ?o {?ps != t:zzz}{%ATTRIB == t:attrib1}. ...)
> 2004-08-13T16:50:44Z <ericP> doen that way to make it so SRC constraints are handled the same way as any other constraints
> 2004-08-13T16:51:00Z <ericP> i wonder what XUL uses
> 2004-08-13T16:51:51Z <ericP> crap, SRQL spec disappeared from <http://www.openrdf.org/publications/SeRQL%20user%20manual.pdf>
> 2004-08-13T16:52:01Z <AndyS> In forming a consensus, who/what should be factored in?
> 2004-08-13T16:52:34Z <AndyS> http://www.openrdf.org/doc/users/userguide.html
> 2004-08-13T16:53:19Z <ericP> a general, elegant, beautiful solution without the slightest deference to the more modest needs of the 
>                      users?
> 2004-08-13T16:53:29Z <ericP> (factored in)
> 2004-08-13T16:53:53Z <ericP> what do people do now? what will they do in 1 year?
> 2004-08-13T16:54:07Z <AndyS> I see no contexts or quads
> 2004-08-13T16:54:23Z <ericP> beyond that is probably more work to speculate on than we would save in re-deployment
> 2004-08-13T16:54:37Z <AndyS> The objective is a consensus in the WG - that may be different, may be the same
> 2004-08-13T16:55:05Z <ericP> (btw, elegence proposal was in jest)
> 2004-08-13T16:55:31Z <ericP> FOAF unifiers (i forget the real name) use a bit of this
> 2004-08-13T16:55:39Z <ericP> edd wrote about it...
> 2004-08-13T16:56:13Z <ericP> http://www-106.ibm.com/developerworks/xml/library/x-foaf2.html
> 2004-08-13T16:56:21Z <AndyS> Only got a few mins more ...
> 2004-08-13T16:56:29Z <ericP> Annotea uses it
> 2004-08-13T16:56:42Z <ericP> Ontaria too
> 2004-08-13T16:56:50Z <AndyS> that seems to get well beyond "data access"
> 2004-08-13T16:56:59Z <ericP> what other "open world" data query systems are out there?
> 2004-08-13T16:57:37Z <ericP> well beyond, yeah, wondering if we can support the part of the job that isolates the set of statements 
>                      from a particular document
> 2004-08-13T16:57:44Z <AndyS> This is tricky - is the WG the right set of people?  Seems to be prov and privacy and trust and ... so 
>                      much wider set of interested parties
> 2004-08-13T16:57:59Z <AndyS> Its in danger of being full workshop material!
> 2004-08-13T16:58:17Z <AndyS> So who are the WG interested parties?
> 2004-08-13T16:58:21Z <AndyS> (not me!)
> 2004-08-13T16:59:42Z <ericP> i'm sure we can make this as complicated as we want. someone will always be able to dream up an app 
>                      that justifies new functionality.
> 2004-08-13T17:00:02Z <ericP> do you think isolating the set of statements is the sweet point?
> 2004-08-13T17:00:32Z <ericP> if so, can we construct a forum where, when we ask that question, we hear "yes, that's perfect" ?
> 2004-08-13T17:00:45Z <ericP> if so, let's ask that forum
> 2004-08-13T17:00:47Z <AndyS> Not sure - but I only have "think" experience.  Do you think the WG will cluster around it?
> 2004-08-13T17:01:05Z <AndyS> (this is certainly meeting my idea of a thing to build towards for F2F)
> 2004-08-13T17:02:00Z <ericP> oops, let the smoke out of my crystal ball -- have to go on intuition
> 2004-08-13T17:02:16Z <ericP> may as well sound them out, i guess.
> 2004-08-13T17:05:10Z <AndyS> Anything else to cover now? (its 18:00 here - and I have to pack!)
> 2004-08-13T17:05:12Z -!- ericP2 [matthieu@128.30.52.30] has joined #dawg
> 2004-08-13T17:05:17Z <ericP2> hi andy, sorry
> 2004-08-13T17:05:40Z <ericP2> i made a ref to letting the smoke out and my laptop died
> 2004-08-13T17:05:55Z <AndyS> Smoke out the laptop - serious
> 2004-08-13T17:06:06Z <AndyS>  Matthieu Fuzellier ?
> 2004-08-13T17:06:24Z <ericP2> yes, i'm using his irc client
> 2004-08-13T17:06:29Z <ericP2> and his chair
> 2004-08-13T17:06:42Z <ericP2> he's the new webmaster
> 2004-08-13T17:06:49Z <ericP2> also had a convient irc window
> 2004-08-13T17:07:00Z <ericP2> so i'll be a little while gettting things working again
> 2004-08-13T17:07:05Z <ericP2> shall we call it a day?
> 2004-08-13T17:07:14Z <ericP2> or do you want to wait for me to recover?
> 2004-08-13T17:13:50Z <AndyS> I have to help carry a rat cage - back in 5
> 2004-08-13T17:13:50Z <ericP2> ok
> 2004-08-13T17:13:50Z -!- ericP2 is now known as matthieu
> 2004-08-13T17:13:50Z -!- matthieu [matthieu@128.30.52.30] has left #dawg [Leaving]
> 2004-08-13T17:13:51Z -!- ericP [ericP@128.30.52.30] has joined #dawg
> 2004-08-13T17:13:51Z [Users #dawg]
> 2004-08-13T17:13:51Z [ AndyS] [ ericP] 
> 2004-08-13T17:13:51Z -!- Irssi: #dawg: Total of 2 nicks [0 ops, 0 halfops, 0 voices, 2 normal]
> 2004-08-13T17:14:10Z -!- Channel #dawg created Fri Aug 13 04:20:29 2004
> 2004-08-13T17:15:00Z -!- Irssi: Join to #dawg was synced in 69 secs
> 2004-08-13T17:15:15Z <AndyS> I'm back
> 2004-08-13T17:15:26Z <AndyS> Got about 15
> 2004-08-13T17:18:28Z <AndyS> Your descriptions suggest that SRC isn't just a matter of recording the de facto status quo.
> 2004-08-13T17:18:33Z <ericP> ok. let me tell folks i'm supposed to heat with
> 2004-08-13T17:18:38Z <AndyS> Is that a fair comment?
> 2004-08-13T17:18:57Z <AndyS> (its cold in Boston!)
> 2004-08-13T17:19:11Z <ericP> i'd say, it's not status quo in the QLs, but it is in the apps.
> 2004-08-13T17:20:03Z <AndyS> So issue is how it appears ?
> 2004-08-13T17:20:31Z <ericP> yeah, the step of formlizing it for a QL isn't well understood, i think
> 2004-08-13T17:20:34Z <AndyS> And link to reification - as that is in (minorly) the RDF-core recs
> 2004-08-13T17:20:40Z <ericP> i feel confident, but that's 'cause i have my pet
> 2004-08-13T17:23:11Z <ericP> core says "here's how you reifiy" but specifically says that the product of reification entails 
>                      nothing (beyond the simple graph), ie, no way to de-reify
> 2004-08-13T17:24:08Z <ericP> also, folks have lots of issues with reification and the superman prob
> 2004-08-13T17:24:43Z <ericP> i think the owness should be on the person making the owl:sameAs statements, not on the person saying 
>                      "there is this statement..."
> 2004-08-13T17:25:08Z <AndyS> Exactly - there is a prov soln in rec - no one likes it but it is there.  Are we effectively ignoring 
>                      it?  By passing it?  Must be clear before Last call.
> 2004-08-13T17:25:43Z <ericP> http://www.w3.org/2001/12/attributions/#superman
> 2004-08-13T17:26:25Z <ericP> i don't think we have to bypass it, we can use the QL to imply the query over reified data
> 2004-08-13T17:26:42Z <ericP> but there are so many outstanding issues, i don't think we can
> 2004-08-13T17:26:55Z <ericP> so i *do* think we hav eto bypass it, i guess
> 2004-08-13T17:27:33Z <AndyS> so {:x :y :z :s} is shorhand for a stating?
> 2004-08-13T17:28:26Z <AndyS> This is another BobG interest area BTW
> 2004-08-13T17:28:45Z <ericP> "skating"? is that a reified graph?
> 2004-08-13T17:29:54Z <AndyS> No - is a quds actually the reification {:x :y :z :s} == { :s rdf:type Statement ; rdf:subject :x ; 
>                      rdf:predicate :y ; rdf:object :z }
> 2004-08-13T17:30:23Z <AndyS> Or  {:x :y :z :s} == { _:b rdf:type Statement ; rdf:subject :x ; rdf:predicate :y ; rdf:object :z } && 
>                      _:b seenIn :s
> 2004-08-13T17:31:07Z <AndyS> If bypass, then I think we aren't in QL land but in RDF2 land
> 2004-08-13T17:31:09Z <ericP> is there a standardized (or close) seenIn ?
> 2004-08-13T17:31:22Z <ericP> (log:include)
> 2004-08-13T17:31:27Z <AndyS> Not that I know of.
> 2004-08-13T17:31:43Z <ericP> hmm, tricky
> 2004-08-13T17:31:45Z <AndyS> log:include takes formula for RHS ?
> 2004-08-13T17:32:36Z <ericP> i always see the value being a statement
> 2004-08-13T17:32:53Z <ericP> but it is in {}s so it coudl be a bunch fo statements
> 2004-08-13T17:34:07Z <AndyS> In particular it is not a named group 
> 2004-08-13T17:34:12Z <AndyS> c.f. TriX 
> 2004-08-13T17:35:04Z <AndyS> So - it seems that there is discussion in Wg to be had.  Whether to start with doc text or with email 
>                      or something else is up to you (:-)
> 2004-08-13T17:35:36Z <ericP> arguemtns to either approach
> 2004-08-13T17:35:57Z <AndyS> Its style and preference as much as anything 
> 2004-08-13T17:36:46Z <ericP> i'd like to ask some group "who uses provenance? do you have QL support?"
> 2004-08-13T17:36:49Z <ericP> what group would i ask?
> 2004-08-13T17:37:35Z <AndyS> WG at least - maybe others - but it is about WG consensus
> 2004-08-13T17:37:45Z <ericP> also, if we don't draw an arc between a statement and the document it came it (just describe 
>                      provenance in normaitve text), i think we can duck RDF2
> 2004-08-13T17:37:50Z <ericP> true
> 2004-08-13T17:39:09Z <AndyS> Must go - have fun in FL - and hope you get there!
> 2004-08-13T17:40:24Z <ericP> cheers




-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741 (does not work in Asia)

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Saturday, 14 August 2004 00:22:05 UTC