Re: Some thoughts about the revised provenance Model document from Paul Groth on 2011-10-20 (public-prov-wg@w3.org from October 2011)

From: Paul Groth <p.t.groth@vu.nl>
Date: Thu, 20 Oct 2011 11:37:28 +0200
To: Graham Klyne <Graham.Klyne@zoo.ox.ac.uk>
CC: "Myers, Jim" <MYERSJ4@rpi.edu>, 'W3C provenance WG' <public-prov-wg@w3.org>, "Luc Moreau (L.Moreau@ecs.soton.ac.uk)" <L.Moreau@ecs.soton.ac.uk>
Message-ID: <4E9FEBD8.2010607@vu.nl>
Hi All,

I think Graham's proposal may be nice. Is it correct to say that  the 
assumption is that if you declare something of type Entity then all the 
corresponding domain specific properties are characterizing it.

But I guess the assumption is that these attributes are not transitive?

For example, take the following RDF graph:

cf:e2 a prov:Entity.
cf:e2 cf:hasLocation dbpedia:Berlin.
dbpedia:Berlin dbpedia-owl:leader dbpedia:Klaus_Wowereit.
dbpedia:Klaus_Wowereit dbpprop:nationality dbpedia:Germany.


Obviously, I can just keep building this massive graph using linked data.

If that's the case what characterizes cf:e2?

Is it just cf:hasLocation dbpedia:Berlin or is it everything else?

cheers,
Paul









Graham Klyne wrote:
> Jim,
>
> Yes, we seem to be converging.
>
> If there's a consensus that it's important to be able to use the
> provenance model (alone) to refer to things like "the image with
> subject X", then I'd agree that attributes as described are a
> reasonable way to do this, and I'm prepared to back down on this
> point.  My own take was that any of the common referencing mechanisms
> could be used, including using a URI, and that the provenance model
> should focus on actually representing the provenance information.
> Simply adding the notion of attributes is a relatively small
> overhead, which is probably useful in a majority of cases, and fits
> quite well with an RDF representation. For me, the compelling case
> you make is that the attributes make it easier to convert to/from
> other formats for representing provenance.
>
> But I think that worrying about "non-characterizing attributes", and
> requiring mechanisms to distinguish them, is an unnecessary
> complication at several levels.
>
> I agree with you that we don't need to worry about "incidental"
> fixed attributes.  I'd take the view that any attributes mentioned in
> an entity expression are defined to be characterizing for the
> purposes ascribing provenance to that entity.  If a user creates a
> redundant expression containing incidental attributes, then what
> harm?
>
> I need to find time to revisit the model document to turn this into a
> formal issue.  When I do so, I'd plan to include a change proposal.
> Meanwhile, I don't think this should be having too much impact on
> developing other areas.
>
> #g --
>
> On 18/10/2011 15:45, Myers, Jim wrote:
>>
>>> -----Original Message----- From: Graham Klyne
>>> [mailto:graham.klyne@zoo.ox.ac.uk] Sent: Monday, October 17, 2011
>>> 1:48 PM To: Myers, Jim Cc: Paul Groth; 'W3C provenance WG'
>>> Subject: Re: Some thoughts about the revised provenance Model
>>> document
>>>
>>> On 04/10/2011 15:24, Myers, Jim wrote:
>>>>> To the extent that provenance assertions actually *are*
>>>>> static attributes of that entity, ...
>>>> - how do you make such assertions if static attributes aren't
>>>> in the model?
>>> Jim,
>>>
>>> Short answer: use a new predicate.
>>>
>>> For example, if w is weather in London, then w1 = ( w such that
>>> On20111017(w) ) might be the weather in London on 2011-10-17.
>>> Of course, the predicate can alternatively be construed as an
>>> attribute+value, which is close to what I alluded to when I said
>>> "To the extent that provenance assertions actually *are* static
>>> attributes of that entity".  (I think Quine discussed this kind
>>> of duality between predicates and properties of things in one of
>>> his assays.)
>>>
>>> ...
>>
>>> But I detect two different possible questions here:
>>>
>>> - how do you make such assertions using the model if static
>>> attributes aren't in the model?
>>>
>>> and
>>>
>>> - how do you make such assertions by any means if static
>>> attributes aren't in the model?
>>>
>>> And I realize my previous answer only addresses the latter case.
>> Yes - I'm concerned about the former - can we cover use cases with
>> just the model itself. I think both you and Satya make good
>> arguments that, if you're willing to go beyond the model, you can
>> do a better/more general job of modeling how an entity is fixed. I
>> don't want to preclude that, but I'm concerned that if there's not
>> an in-model option, we haven't really provided provenance
>> interoperability (some agreement would be required between parties
>> for one to generate provenance and the other to answer questions
>> with it that correspond to our use cases (examples below).
>>
>>>> If I understand your concerns correctly, they are partly due
>>>> to language about entities being 'defined by' fixed attributes.
>>>> I don't think it would be problematic, and in fact would
>>>> probably agree with an alternate description that talks about
>>>> entities being characterizations of things that limit their
>>>> behavior over time (statements about object identities and
>>>> states/characteristics have to remain true...)  and that
>>>> characterizing attributes are a/the way in the provenance model
>>>> to allow interchange of information about those limits (i.e.
>>>> such attributes are not theoretically a defining aspect of
>>>> entities but, in order to allow interchange and practical use
>>>> of provenance information without knowledge of external object
>>>> definitions/functionality, fixed attributes are defined.)
>>>>
>>>> Does this get us to a conceptual consensus?
>>> That's certainly closer to what I had in mind.  In reading your
>>> earlier response, I realized I had not one but two niggles with
>>> the current formulation:
>>>
>>> (1) the need to distinguish between characterizing and
>>> non-characterizing attributes
>>>
>>> (2) The use of attributes at all.
>>>
>>> I read your above formulation as allowing us to talk about
>>> characterizing attributes while completely ignoring other
>>> attributes, which AFAICT are irrelevant to description of
>>> provenance, addressing my concern (1)
>>>
>> I think I agree re: (1) but to clarify: I've heard two senses of
>> non-characterizing - one where non-characterizing means
>> 'incidental' - my height is fixed but I'd still be 'me' if it were
>> different, and one where it means non-fixed. My sense is that we
>> don't need to distinguish the first sense - whether a fixed
>> property is characterizing or incidental is irrelevant to it being
>> useful for discovering an instance and we just need a neutral term.
>> Non-fixed attributes, such as my location, and the interest some
>> have expressed in being able to record them, ie. say that Jim has a
>> location without giving a value (with Jim-in-NY being another
>> ivp/complementary entity where it is fixed), seems like it can be
>> pushed out of scope - while one might want to infer that Jim
>> complement of Jim-in-NY implies that Jim has a mutable location, we
>> don't lose any functionality requiring people to frame a query
>> about where Jim was at a specific time/point in provenance as a
>> query about Jim being t
>
> he complement of an entity with a fixed location.
>>> ...
>>>
>>> My concern (2) is more subtle (and I could more easily let it
>>> be).
>>>
>>> The goal that I perceive is to be able to say that some entity,
>>> say e1, is a characterization of a dynamic entity, say e, that
>>> allows us to make some provenance assertions whose truth is not
>>> ephemeral.  What I don't see is why one needs to know exactly how
>>> the constraints on e that correspond to e1 are determined.  If
>>> one makes a (true) provenance assertion about e1, then it seems
>>> to me that the necessary constraints exist for the provenance
>>> assertion about e1 to be true.  My assumption is that the
>>> specific nature of the constraints is application or context
>>> dependent, and does not need to be part of the core provenance
>>> model.
>> I think we need this goal (to be able to make non-ephemeral
>> statements about e) but I think the use of something like e1 has an
>> additional goal - to record how e changed. If I just want to record
>> unambiguously that a document was edited, I'd like to just say
>> document e participatedIn editingPE, without creating an e1 or e2
>> for the before/after entities. (I'm not sure if that is currently
>> consistent with how we've defined participation - I think it was at
>> one point). To me, the value of making e1 and e2 explicit is to
>> have a place where I can unambiguously talk about the text in those
>> versions. e1 hasText "Hello" and e2 hasText "Hello World".
>>> For comparison: If we assert that a particular document d3 was
>>> derived from some datasets d1 and d2, we accept that as an
>>> assertion, without having to care about how that knowledge was
>>> obtained.  I'm applying a similar standard to the nature of
>>> constraints used to determine views (IVPs) for which provenance
>>> can be asserted.
>> I think this would be the case where I want to just say e
>> participatedIn editingPE - that assertion implies that there were
>> before and after versions that differ in aspects affected by
>> editingPE, but if I don't think the attributes of those versions
>> are of interest, I wouldn't even assert them.
>>
>> TO address the point that some people may have better out-of-band
>> mechanisms to characterize entities, I think the prov model should
>> still allow one to id e1 and e2 without attributes - someone who is
>> constrained to only use the prov model itself can't make as much
>> use of those (I guess they could return the product of e being
>> edited, but they couldn't return the version with given text or
>> length, etc if those weren't attributes), but there's no reason to
>> require attributes to be asserted by parties that have their own
>> interoperability agreements about the nature of entities they're
>> describing.
>>> Having said that, I note you mentioned in another message that
>>> the notion of attributes was needed to satisfy the provenance
>>> challenge.  I'm not aware of the details:  maybe there's a
>>> specific use-case here that could change my perspective on
>>> this...
>> Simple things - the Challenge required on group to read provenance
>> from another group and answer questions like (these are not the
>> real challenge examples) "what was derived from the image
>> withSubject x)" or "which products from stage2 with dc:creator Bob
>> were used to create new files". If you envision OPM as a model, not
>> a syntax, and groups with text, XML, and RDF implementations you
>> can start to see the problem. The model to mapping syntax covers
>> how to read artifacts and processes from the serialized forms, but
>> without attributes in the model, there was no standard/predefined
>> way to find which image was of subject X, or what processes were
>> 'stage2' or which files had dc:creator Bob. Having the idea in the
>> model that artifacts and processes could have key/value attributes
>> and then describing how key/value attributes are serialized to the
>> different formats enabled the groups to read each other's
>> provenance and answer the queries. (In reality, before we had
>> attributes in O
>
> PM, everyone looked at the serializations and figured out how to map
> and just extended the model on their own to get the work done, but in
> doing so we recognized that we had done something outside the model
> to answer the queries.)
>> Cheers - and apologies for being in and out of the discussions,
>> Jim
>>> #g --
>>>
>>>>> then the existence of static attributes (in the style of
>>>>> "characterizing attributes") may be inferred.  In this
>>> respect,
>>>>> the static attributes are a consequence rather than a
>>>>> defining aspect of
>>> the
>>>>> existence of meaningful provenance information.
>>>>>
>>>>> #g --
>>>>>
>>>>>> Graham Klyne wrote:
>>>>>>> Jim,
>>>>>>>
>>>>>>> If I understand you correctly, the significance of
>>>>>>> attributes is for
>>> discovery
>>>>>>> of of related resources.
>>>>>>>
>>>>>>> My understanding is that the primary purpose of
>>>>>>> provenance is to
>>>>> establish a
>>>>>>> basis for trust, a reason to believe (or not) some
>>>>>>> information that is
>>>>> presented
>>>>>>> about some subject. It's not clear to me what need there
>>>>>>> is to use
>>>>> attributes
>>>>>>> for resource discovery to achieve this end. (But I may
>>>>>>> well be missing something here.)
>>>>>>>
>>>>>>> So, on this basis, there may be perfectly good reasons to
>>>>>>> have defined attributes and values for discovery
>>>>>>> purposes, I'm not seeing why they
>>> are
>>>>> needed
>>>>>>> to achieve the goals of *provenance* information.
>>>>>>>
>>>>>>> (But it's getting late here, and maybe I'm missing some
>>>>>>> key point in
>>> your
>>>>>>> message.)
>>>>>>>
>>>>>>> In summary: I think your concerns are reasonable, but
>>>>>>> what makes
>>> them
>>>>> in scope
>>>>>>> specifically for *provenance* information?
>>>>>>>
>>>>>>> #g --
>>>>>>>
>>>>>>> On 29/09/2011 18:44, Myers, Jim wrote:
>>>>>>>> Graham,
>>>>>>>>
>>>>>>>> How would we use provenance to find, for example, how
>>>>>>>> Luc got to
>>>>> Boston? It's
>>>>>>>> clear if we have fixed attributes for name and location
>>>>>>>> such that we
>>>>> could
>>>>>>>> query for an entity with name Luc that has an ivpOf
>>>>>>>> relationship with
>>> an
>>>>>>>> entity in Boston and then look at the provenance from
>>>>>>>> there. How
>>> would
>>>>> it
>>>>>>>> work without fixed attributes in the prov model? I'm
>>>>>>>> guessing that
>>>>> you're
>>>>>>>> thinking that we can find those attributes outside the
>>>>>>>> language
>>>>> somewhere
>>>>>>>> (e.g. non-prov RDF statements) but what are the
>>>>>>>> minimal
>>> requirements
>>>>> there
>>>>>>>> and what language/models exist that meet them? Can we
>>>>>>>> only model
>>>>> provenance
>>>>>>>> of things for which ontologies have been developed?
>>>>>>>> Presumably it
>>> has
>>>>> to be
>>>>>>>> possible to associate descriptive metadata with the
>>>>>>>> entities through
>>>>> some
>>>>>>>> path (what relationship(s)?)? And it has to be clear
>>>>>>>> which metadata
>>> is
>>>>> fixed?
>>>>>>>> You mention being able to infer across ivpOf
>>>>>>>> relationships - is there
>>> one
>>>>> set
>>>>>>>> of inference rules for all possible descriptive
>>>>>>>> metadata? Or do we
>>> need
>>>>> to be
>>>>>>>> able
>>>>>>> to distinguish further between types of metadata?
>>>>>>>> -->    As you can probably guess from the questions
>>>>>>>> above, I'm
>>> concerned
>>>>> that
>>>>>>>> kicking fixed attributes out will end up being more
>>>>>>>> complex and place
>>> a
>>>>>>>> higher burden on users than keeping them in, but I may
>>>>>>>> be
>>>>> misunderstanding
>>>>>>>> how such an alternative would work. Part of that
>>>>>>>> concern is that I
>>> think I
>>>>>>>> hear that modeling experts in this group can handle
>>>>>>>> defining classes
>>> for
>>>>>>>> different types of entities that would allow discovery
>>>>>>>> by attribute, but
>>>>> I'm
>>>>>>>> concerned that being able to do this becomes a
>>>>>>>> requirement for using provenance (versus asserting
>>>>>>>> entities defined solely by
>>> attributes(entity,
>>>>>>>> name=Luc) or perhaps in a mixed mode (e.g. an entity
>>>>>>>> representing
>>> Luc
>>>>> that
>>>>>>>> 'hasBaseType' foaf:person and one representing him in
>>>>>>>> Boston that
>>> also
>>>>>>>> hasBaseType foaf:person and location=Boston as a fixed
>>>>>>>> attribute.)
>>>>> Again -
>>>>>>>> perhaps I'm misunderstanding how discovery based on
>>>>>>>> descriptive
>>>>> information
>>>>>>>> could be done if we don't have fixed characterizing
>>>>>>>> attributes in the
>>> prov
>>>>>>>> standard....
>>>>>>>>
>>>>>>>> Jim
>>>>>>>>
>>>>>>>>> 3. Do we need to model "Characterizing attributes"?
>>>>>>>>>
>>>>>>>>> The notions of "characterizing attributes" have
>>>>>>>>> developed to derive
>>> the
>>>>>>>>> relationship between different entities that are
>>>>>>>>> views of some
>>>>> common
>>>>>>>>> thing in the world. I am not convinced that we need
>>>>>>>>> to model these attributes, and I'm not sure the way
>>>>>>>>> they are modelled can
>>> necessarily
>>>>> apply
>>>>>>>>> in all situations that applications might wish to
>>>>>>>>> represent.
>>>>>>>>>
>>>>>>>>> At heart: when it comes to exchanging provenance
>>>>>>>>> information, why
>>> do
>>>>> we
>>>>>>>>> *need* to know exactly what makes one entity a
>>>>>>>>> constrained view of another? What breaks (at the
>>>>>>>>> level of exchanging provenance
>>>>> information) if
>>>>>>>>> we have no access to such information? How are
>>>>>>>>> applications that
>>>>> exchange
>>>>>>>>> provenance information about entities for which they
>>>>>>>>> don't actually
>>>>> know
>>>>>>>>> about these attributes to know about their
>>>>>>>>> correspondences with
>>> real-
>>>>> world
>>>>>>>>> things?
>>>>>>>>>
>>>>>>>>> I think the role of attributes here is mainly to
>>>>>>>>> *explain* some
>>> aspects
>>>>> of the
>>>>>>>>> provenance model, but they do not need to be part of
>>>>>>>>> the model.
>>>>>>>>>
>>>>>>>>> To my mind, a simpler approach would be to allow for
>>>>>>>>> assertion of
>>> an
>>>>> IVPof
>>>>>>>>> type of relationship between entities, from which
>>>>>>>>> some useful
>>>>> inferences
>>>>>>>>> about any attributes present might flow, but I don't
>>>>>>>>> see the need for
>>>>> the
>>>>>>>>> attributes to be in any sense defining of the
>>>>>>>>> entities.
>>>>>>>>>
>>>>>>>>> <aside> My suggested definition of IVPof might be
>>>>>>>>> something like this:
>>>>>>>>>
>>>>>>>>> A IVPof B iff forall p : (Entity ->    Bool) . p(B)
>>>>>>>>> =>    p(A)
>>>>>>>>>
>>>>>>>>> where A, B are Entities, and the values of p are
>>>>>>>>> predicates on
>>> Entities.
>>>>>>>>> </aside>
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> #g
>>>>>>>>>
>>>>>>>>
>>>>
>>>>

-- 
Dr. Paul Groth (p.t.groth@vu.nl)
http://www.few.vu.nl/~pgroth
Assistant Professor
Knowledge Representation & Reasoning Group
Artificial Intelligence Section
Department of Computer Science
VU University Amsterdam
Received on Thursday, 20 October 2011 09:40:41 UTC