Re: PROV-ISSUE-89 (what-entity-attributes): How do we find the attributes of an entity? [Formal Model]

On Fri, Sep 16, 2011 at 23:48, Timothy Lebo <lebot@rpi.edu> wrote:

> Why are these characterizing entities NOT on the Entity itself?
>
> Entity is _already_ providing us the indirection that we need to distinguish between EVERYBODY'S description of the car IN ALL ETERNITY and _our_ description as we are observing it for our time period and context.

I fully agree! That is what the entity IS - it *is* the description -
and so it should have some properties!


> One of the SPECIAL characterizingProperties is prov:wasComplementOf, which is pointing at a URI of the "invariant" car in my driveway.

This is a *very* important point! Thanks for picking this up. Also a
good argument for not using rdf:List as I assume we want to implicitly
include that for any entity.


> :owner rdfs:subPropertyOf prov:characterizingProperty . # Though, why can't we just look at ALL RDF properties as characterizing entities?

Well, see my example in
http://www.w3.org/2011/prov/wiki/WorkflowExample#Provenance_container_example

Here the provenance ontology has been extended with domain-specific
provenance metadata using CURIEs wf: and impl:.


:input a prov:Entity, impl:FileValue, wf:Value ;
    prov2:characterizingProperties ( impl:value ) ;
    impl:file "/tmp/myinput.txt" ;

    impl:value [
    # Snapshot of actual value as it was read by :wfEngine
        a cnt:ContentAsText ;
        cnt:characterEncoding "UTF-8" ;
        cnt:chars "Steve"
    ] .

Here I wanted to say that what characterises this input is the
impl:value (ie. content of the file). The reason is that impl:file
(although a very useful property to include in the provenance
assertions) is not necessarily constant for the duration of the use of
this :input within the workflow - by the time the value is used the
file could have been deleted, overwritten or moved, but for the
workflow engine this does not matter, only the impl:value mattered for
the process executions where this entity was used.


However I might make a complementOf of :input if I want to describe
the data at the point when it was read from the file, when the
filename was constant as well as the content. So if I understand you
correct, by your proposal to 'just use RDF' I would only state
properties for those entities where it is characterising? (as entities
already are indirections for "things in the world")



:input a prov:Entity, wf:Value ;
    impl:value [ a cnt:ContentAsText ;
        cnt:characterEncoding "UTF-8" ;
        cnt:chars "Steve"
    ] .

:inputFile a prov:Entity, impl:FileValue, wf:Value ;
    prov:wasComplementOf :input ;
    impl:file "/tmp/myinput.txt" ;
    impl:value [ a cnt:ContentAsText ;
        # Is impl:value still needed, or included by indirection of
wasComplementOf?
        cnt:characterEncoding "UTF-8" ;
        cnt:chars "Steve"
    ] .

I wouldn't include much more details of how :inputFile was generated
as the workflow engine did not record any details about when or how
the file was read, and it is not used by any asserted PEs - but I am
of course still free to assert that such an entity existed. "At some
point during :input's life (over which the content is fixed) there was
an impl:file that had that content" - which sounds quite good.

Perhaps this is the line Satya was thinking about with attributes the
whole time, leading to the initial confusion on Luc's question. Satya?



> One level down, or simply directly on the :Entity?

With that I considered that we have not decided yet exactly where the
attributes would go, in my proposal one level down *is* directly on
the prov:Entity.


> Simply on the prov:Entity allows any level of "down" because it's just domain-specific OWL axioms.

Could you elaborate with a little example? Do you mean thanks to
including the complementOf property or by referring to other
prov:Entities? (I like this prefix!)


> Agreed. But how does that prevent what you think we can't say ("being owned by luc and any other owner")?

With your proposal that would not be a problem, you are right, because
that other owner would not be declared on "my" prov:Entity, but on a
different car entity or perhaps a non-entity resource (thing in the
world).


> I don't think we need to assume named graph scoping. We have the indirection with prov:Entity already and these entities are being grouped by Accounts - so they can float around in the Big Graph and nothing will break.

This is a side note not yet addressed by the current ontology.

So you are saying we can use prov2:Accounts to link to who 'defined'
the entities (and also process executions et al?) - and not be
restricted to one provenance account == one graph/resource.

To me the reading of accountExpression in
https://dvcs.w3.org/hg/prov/raw-file/8be7e9ea81f0/model/ProvenanceModel.html#expression-Account
was asking for named graphs, in particular when talking about shared
identifiers.


However defining accounts flatly requires any different entities to
have different URIs (which can be be consolidated in a different way,
for instance a common wasComplementOf).

My understanding by an example:

Both accounts try to express how a workflow ran, but account B did so
by monitoring from the outside, instead of logging directly what
happened as in account A. They therefore might not always agree on
things like file content.


:accountA a prov2:Account ;
   prov2:expresses :entity1, :entity1a, :entity2 .
:accountB a prov2:Account ;
   # Is he allowed to re-use :entity1 from :accountA, or would each
account always make his own entities?
   prov2:expresses :entity1, :entity1b, :entity3 .


:entity1 impl:file "/tmp/myinput.txt" .

# Implied by wasComplementOf?
#:entity1a impl:file "/tmp/myinput.txt" .
#:entity1b impl:file "/tmp/myinput.txt" .

:entity2 impl:value [ cnt:chars "Fish" ] .
:entity3 impl:value [ cnt:chars "Soup" ] .

:entity1a prov:wasComplementOf :entity1, :entity2 .
:entity1b prov:wasComplementOf :entity1, :entity3 .
# implied both impl:file and impl:value ?



>> This one is unfortunately tricky in SPARQL as rdf:List are really
>> unpacked linked nodes and we don't know the position of the attribute.
> (Although I disagree with the premies)
> Why would order matter?

Side note:

To know how many rdf:next to follow, unless you know a cool trick to
do recursion or "is-in-list" -support into sparql..? I know you can
express lists with () syntax, but you still need to know in which
position to put the ?thing.

-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Saturday, 17 September 2011 00:20:21 UTC