Re: PROV comments from Clark&Parsia

Hi James,

Please find some comments regarding point 1 below. My colleague Evren will
address point 2.

On Mon, Jan 14, 2013 at 5:36 PM, James Cheney <jcheney@inf.ed.ac.uk> wrote:

<snip>

Two substantive points:
>
> 1.  There could potentially be other occurrences of y in other statements
> in I.  These also need to be replaced with S(y), which is why we specify
> that S needs to be applied to the whole instance.  (There are more
> efficient ways of obtaining the same behavior, avoiding visiting the whole
> instance, but the behavior must be the same as the specification.)  For
> example, suppose the instance as a whole were:
>
> A0. wasDerivedFrom(d; e,e0,a,y,u).
> A1. wasGeneratedBy(gen1; e,a, t1, []).
> A2. wasGeneratedBy(y; e,a, t1, [attr1=value1]).
>
> Then if you just apply S to A1 and A2, you get
>
> A0. wasDerivedFrom(d; e,e0,a,y,u).
> A1'. wasGeneratedBy(gen1; e,a, t1, []).
> A2'. wasGeneratedBy(gen1; e,a, t1, [attr1=value1]).
>
> and the connection between the generation involved in derivation step A0
> and that described by A2 is lost.  (In this case, it so happens that other
> constraints may force us to re-discover that y = gen1, but this is just
> luck.  If we replaced the generations by uses this would not be the case.)
>
> Generally, once we discover that a variable is equal to some other term,
> we should substitute the variable everywhere, as is done in the "TGD/EGD
> chase" in relational databases, which partly inspires what we are doing.
>

You seem to assume that variable 'y' stands for the same term in every
independent statement. This will be extremely confusing for people with a
logic programming background. There is the fundamental assumption that
clauses (i.e., facts, statements) do not share variables.

In order to address this case, one could use functional terms:

A0. wasDerivedFrom(d; e,e0,a,*f(d)*,u).
A1. wasGeneratedBy(gen1; e,a, t1, []).
A2. wasGeneratedBy(*f(d)*; e,a, t1, [attr1=value1]).

But now, things get trickier as gen1 won't unify with f(d). One could use
some kind of binary 'canReplace' predicate and have uniqueness constraints
infer things like canReplace(gen1, f(d)). Based on this predicate, one
could define equivalence classes (sets of terms that represent the same
constant), come up with some replacing mechanism (to replace f(d) with gen1
everywhere), and impose the constraint that we can have at most one
constant in every equivalence class (i.e., canReplace(gen1, gen2) should
lead to invalidity as the arguments do not unify).

-- 
Best,
Héctor

Received on Tuesday, 15 January 2013 15:19:10 UTC