Re: PROV-ISSUE-89 (what-entity-attributes): How do we find the attributes of an entity? [Formal Model]

On Sep 16, 2011, at 8:19 PM, Stian Soiland-Reyes wrote:

> On Fri, Sep 16, 2011 at 23:48, Timothy Lebo <lebot@rpi.edu> wrote:
> 
>> Why are these characterizing entities NOT on the Entity itself?
>> 
>> Entity is _already_ providing us the indirection that we need to distinguish between EVERYBODY'S description of the car IN ALL ETERNITY and _our_ description as we are observing it for our time period and context.
> 
> I fully agree! That is what the entity IS - it *is* the description -
> and so it should have some properties!
> 
> 
>> One of the SPECIAL characterizingProperties is prov:wasComplementOf, which is pointing at a URI of the "invariant" car in my driveway.
> 
> This is a *very* important point! Thanks for picking this up. Also a
> good argument for not using rdf:List as I assume we want to implicitly
> include that for any entity.


Thanks. But I'm afraid I made the point while misunderstanding what a characterizing property was.
Is characterizing just "more description" or "used to uniquely identify" the Entity?
If "describing" then YES, prov:wasComplementOf IS a special type of characterizing property.
	e.g, the fact that some Entity is describing the color tie _I_ wore yesterday is a very important part of the Entity (since it's describing me)
If "uniquely identify", then NO, prov:wasComplementOf CANNOT be used as a "characterizing property". (Is this term described somewhere? I'd rather not reuse it without a permanent home)
	e.g. two Entities describing my tie yesterday and the day before CANNOT become the same simply because they refer to me (though, if you use its complementOf and the day, it would suffice)



Which leads me to believe that we need to <strike>characterizing</strike> and start saying owl:key .


> 
> 
>> :owner rdfs:subPropertyOf prov:characterizingProperty . # Though, why can't we just look at ALL RDF properties as characterizing entities?
> 
> Well, see my example in
> http://www.w3.org/2011/prov/wiki/WorkflowExample#Provenance_container_example
> 
> Here the provenance ontology has been extended with domain-specific
> provenance metadata using CURIEs wf: and impl:.
> 
> 
> :input a prov:Entity, impl:FileValue, wf:Value ;
>    prov2:characterizingProperties ( impl:value ) ;
>    impl:file "/tmp/myinput.txt" ;
> 
>    impl:value [
>    # Snapshot of actual value as it was read by :wfEngine
>        a cnt:ContentAsText ;
>        cnt:characterEncoding "UTF-8" ;
>        cnt:chars "Steve"
>    ] .
> 
> Here I wanted to say that what characterises this input is the
> impl:value (ie. content of the file). The reason is that impl:file
> (although a very useful property to include in the provenance
> assertions) is not necessarily constant for the duration of the use of
> this :input within the workflow - by the time the value is used the
> file could have been deleted, overwritten or moved, but for the
> workflow engine this does not matter, only the impl:value mattered for
> the process executions where this entity was used.
> 
> 
> However I might make a complementOf of :input if I want to describe
> the data at the point when it was read from the file, when the
> filename was constant as well as the content. So if I understand you
> correct, by your proposal to 'just use RDF' I would only state
> properties for those entities where it is characterising? (as entities
> already are indirections for "things in the world")
> 
> 


I'm afraid I can't answer without resolving "characterizing" - first by knowing how this term is not owl:key .
In the example above to the example below, YES, this looks to be more appropriate. Then you define wf:Value owl:key ( impl:value ) while making sure the ontology also identifies that bnode with any other resource with "UTF-8" and "Steve" (with another owl:key).


> 
> :input a prov:Entity, wf:Value ;
>    impl:value [ a cnt:ContentAsText ;
>        cnt:characterEncoding "UTF-8" ;
>        cnt:chars "Steve"
>    ] .
> 
> :inputFile a prov:Entity, impl:FileValue, wf:Value ;
>    prov:wasComplementOf :input ;
>    impl:file "/tmp/myinput.txt" ;
>    impl:value [ a cnt:ContentAsText ;
>        # Is impl:value still needed, or included by indirection of
> wasComplementOf?
>        cnt:characterEncoding "UTF-8" ;
>        cnt:chars "Steve"
>    ] .
> 
> I wouldn't include much more details of how :inputFile was generated
> as the workflow engine did not record any details about when or how
> the file was read, and it is not used by any asserted PEs - but I am
> of course still free to assert that such an entity existed. "At some
> point during :input's life (over which the content is fixed) there was
> an impl:file that had that content" - which sounds quite good.
> 
> Perhaps this is the line Satya was thinking about with attributes the
> whole time, leading to the initial confusion on Luc's question. Satya?
> 
> 
> 
>> One level down, or simply directly on the :Entity?
> 
> With that I considered that we have not decided yet exactly where the
> attributes would go, in my proposal one level down *is* directly on
> the prov:Entity.



I see. This make a lot of sense.



> 
> 
>> Simply on the prov:Entity allows any level of "down" because it's just domain-specific OWL axioms.
> 
> Could you elaborate with a little example? Do you mean thanks to
> including the complementOf property or by referring to other
> prov:Entities? (I like this prefix!)



I did not mean by using of prov:wasComplementOf, but that's an interesting idea.

Here, "level down" follows this graph pattern:

?e a prov:Entity .
?e ?p_1 ?r_1 .
                ?r_1 ?p_2 ?r_2 .
                                    ?r_2 ?p_3 ?r3 .

?r_1 is "one level down" as you just mentioned above.
?r_2 is "two levels down" (this is my NON-wasComplementOf depth)
?r_3 is "three levels down" (again more NON-wasComplement depth)


You can use OWL to describe what any subclass of prov:Entity that look like.
OWL can answer "what should ?r_2 look like" and so on for ?r_3 etc.


Further, you can use this to describe expectations for the Entity on the other side of prov:wasComplementOf.
To use part of your example:

> :inputFile a prov:Entity, impl:FileValue, wf:Value ;
>    prov:wasComplementOf :input ;
>    impl:file "/tmp/myinput.txt" ;
>    impl:value [ a cnt:ContentAsText ;



wf:Value # Values are things that have one and only one cnt:ContentAsText impl:values; they also have one and only one wf:Value prov:wasComplementOfs.
	a owl:Class;
	rdfs:subClassOf [
		a owl:Restriction;
		owl:onProperty impl:value;
		owl:cardinality 1;
	];
	rdfs:subClassOf [
		a owl:Restriction;
		owl:onProperty impl:value;
		owl:allValuesFrom cnt:ContentAsText;
	];
	rdfs:subClassOf [
		a owl:Restriction;
		owl:onProperty prov:wasComplementOf;
		owl:cardinality 1;
	];
	rdfs:subClassOf [
		a owl:Restriction;
		owl:onProperty prov:wasComplementOf;
		owl:allValuesFrom wf:Value;
	];




> 
> 
>> Agreed. But how does that prevent what you think we can't say ("being owned by luc and any other owner")?
> 
> With your proposal that would not be a problem, you are right, because
> that other owner would not be declared on "my" prov:Entity, but on a
> different car entity or perhaps a non-entity resource (thing in the
> world).


Yes. They would be different entities.
However, I'm not sure we are currently ALLOWED to point to a non-entity resource (thing in the world).
If we can, please let me know on http://www.w3.org/2011/prov/track/issues/97




> 
> 
>> I don't think we need to assume named graph scoping. We have the indirection with prov:Entity already and these entities are being grouped by Accounts - so they can float around in the Big Graph and nothing will break.
> 
> This is a side note not yet addressed by the current ontology.


Yes. TBD.


> 
> So you are saying we can use prov2:Accounts to link to who 'defined'
> the entities (and also process executions et al?) - and not be
> restricted to one provenance account == one graph/resource.


I'm assuming we'll have a single prov:Account that points to a handful of prov:Entities, prov:PEs, etc.
As was done in OPM.

Unless the RDFWG can give us a standard that formally adopts named graphs....


> 
> To me the reading of accountExpression in
> https://dvcs.w3.org/hg/prov/raw-file/8be7e9ea81f0/model/ProvenanceModel.html#expression-Account
> was asking for named graphs, in particular when talking about shared
> identifiers.

Thanks for pointing it out. Before I can focus on wrapping things up, I need to nail down what I'm wrapping up :-)

> 
> 
> However defining accounts flatly requires any different entities to
> have different URIs


Which is how it has always been in the semantic web. I don't see why this would need to change.


> (which can be be consolidated in a different way,
> for instance a common wasComplementOf).


Which I think is the correct approach.


> 
> My understanding by an example:
> 
> Both accounts try to express how a workflow ran, but account B did so
> by monitoring from the outside, instead of logging directly what
> happened as in account A. They therefore might not always agree on
> things like file content.
> 
> 
> :accountA a prov2:Account ;
>   prov2:expresses :entity1, :entity1a, :entity2 .
> :accountB a prov2:Account ;
>   # Is he allowed to re-use :entity1 from :accountA, or would each
> account always make his own entities?


I'm torn :-/
Overlapping URIs is good b/c it connects information, but sharing something that gets annotated in the future changes original claims.



>   prov2:expresses :entity1, :entity1b, :entity3 .
> 
> 
> :entity1 impl:file "/tmp/myinput.txt" .
> 
> # Implied by wasComplementOf?
> #:entity1a impl:file "/tmp/myinput.txt" .
> #:entity1b impl:file "/tmp/myinput.txt" .
> 
> :entity2 impl:value [ cnt:chars "Fish" ] .
> :entity3 impl:value [ cnt:chars "Soup" ] .
> 
> :entity1a prov:wasComplementOf :entity1, :entity2 .
> :entity1b prov:wasComplementOf :entity1, :entity3 .
> # implied both impl:file and impl:value ?
> 
> 
> 
>>> This one is unfortunately tricky in SPARQL as rdf:List are really
>>> unpacked linked nodes and we don't know the position of the attribute.
>> (Although I disagree with the premies)
>> Why would order matter?
> 
> Side note:
> 
> To know how many rdf:next


rdf:rest?


> to follow, unless you know a cool trick to
> do recursion or "is-in-list" -support into sparql..?


SPARQL 1.1 returns elements of a list (though, in an unordered bag :-)


> I know you can
> express lists with () syntax, but you still need to know in which
> position to put the ?thing.
> 
> -- 
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
> 

Received on Saturday, 17 September 2011 15:38:45 UTC