Re: Some thoughts about the revised provenance Model document from Graham Klyne on 2011-10-21 (public-prov-wg@w3.org from October 2011)

From: Graham Klyne <Graham.Klyne@zoo.ox.ac.uk>
Date: Fri, 21 Oct 2011 10:34:25 +0100
To: public-prov-wg@w3.org
Message-ID: <4EA13CA1.3040109@zoo.ox.ac.uk>
On 20/10/2011 16:38, Myers, Jim wrote:
>>>> cf:e2 a prov:Entity.
>>>> cf:e2 cf:hasLocation dbpedia:Berlin.
>>>> dbpedia:Berlin dbpedia-owl:leader dbpedia:Klaus_Wowereit.
>>>> dbpedia:Klaus_Wowereit dbpprop:nationality dbpedia:Germany.
>>>>
>>>> Obviously, I can just keep building this massive graph using linked data.
>>>> If that's the case what characterizes cf:e2?
>>>> Is it just cf:hasLocation dbpedia:Berlin or is it everything else?
>>>
>>> IMO:
>>>
>>> Only
>>>      cf:e2 cf:hasLocation dbpedia:Berlin .
>>>
>>> would be characterizing cf:e2.
>>
>> +1
>
> I agree, and this set of statements implies that the entity is characterized by being in Berlin 'however' Berlin changes. The art exhibit was in Berlin for two months and whether those spanned an election or not does not affect the assertion. (Nor would Berlin annexing a neighboring village and changing its boundaries, etc.)

Yes, that's how I see it.

> I think this brings up the larger question of what the additional RDF statements mean, e.g. are they part of the provenance assertions/does the provenance model give them meaning or are they just non-modeled annotation that, if you understand that third-party language, you can use to find the provenance assertions and entities of interest? If they're out of scope, we can perhaps not talk about them at all, but if we want to allow such statements to, for example, be included in an account and asserted, we may need to define their meaning - i.e. when do we assume Klaus was the leader (when the account was created, some point during the entity's lifetime (perhaps ambiguous if we have multiple entities around), over the duration of an entity/all entities?)

This is an area where I've been trying to look at the "detail".  Cf. 
http://lists.w3.org/Archives/Public/public-prov-wg/2011Oct/0156.html

The view I'm currently leaning towards is that statements that are unequivocally 
provenance are expressed explicitly using the core provenance concepts such as 
PEs and agents, etc.

But there may be other RDF statements that can be interpreted as provenance 
information, but to do so might require specific knowledge of the vocabulary 
used and/or domain of the application.  So:

   ex:aDocument a prov:Entity ;
     dcterms:creator "Meritorious Meerkat" .

might be interpreted as a provenance expression, thus:

   ex:DocumentCreation rdfs:subclassOf prov:ProcessExecution .
   ex:aDocument a prov:Entity ;
     prov:wasGeneratedBy
       [ a ex:DocumentCreation ;
         prov:wasControlledBy
           [ a prov:Agent ;
             foaf:name "Meritorious Meerkat"
           ]
       ] .

But I don't think it would be possible *in the general case* to decide if some 
RDF statements are or are not provenance.

> I suggested previously that we might just use the account boundary as a way to distinguish when a statement such as
> cf:e2 cf:hasLocation dbpedia:Berlin was meant to imply that e2 has a fixed/characterizing location of Berlin, versus having the general RDF meaning (as above - no lifetime specified and therefore unclear whether it is asserted as fixed). If we want to ascribe additional meaning to statements about , for example, who was the leader of Berlin, as part of provenance, does an account also help, e.g. putting such an RDF statement in an account is an assertion that it is true for the duration of the account? (And if that's not what you mean, you create an entity to say when it is true, or you leave that statement out of the account/send it as a helpful annotation outside the model).

Hmmm...   is the following too radical?

To the extent that entity descriptions are (part of) provenance expressions, we 
don't recognize non-fixed statements in valid provenance expressions.  (Subject 
to some outermost context implied by the application - I think there's a version 
of the frame problem that can come into play here.  But the immediate concern is 
that by accepting a provenance expression, one accepts the fixedness of any 
statements about the entity/ies mentioned.)


>>> dbpedia:Berlin is not characterized - unless it was also a prov:Entity.
>>>
>>>
>>> Now I don't know the answer for anonymous nodes:
>>>
>>> cf:e2 a prov:Entity.
>>> cf:e2 cf:hasLocation [
>>>      dbpedia-owl:leader [
>>>         foaf:name "Klaus Wowereit"
>>>         dbpprop:nationality dbpedia:Germany
>>>       ],
>>> ]
>>>
>>>
>>> My simple reading of this is that cf:e2 has a location of somewhere
>>> where the German called Klaus Wowereit is "the leader" - but neither
>>> Hr Bürgermeister Wowereit or the implied Berlin is a "characterising
>>> attribute".
>>>
>>> If we distance ourselves slightly from the notions of "characterising
>>> attributes" we can just say that the properties stated directly on (or
>>> with?) an entity was true/fixed attributes throughout the lifetime of
>>> the entity. Any nested propertioes might or might not have been true/
>>> throughout that lifetime.  (Thus cf:e2 could have existed in Berlin
>>> before Hr Wowereit became the mayor).
>>
>> That works for me.
>
> I'm not sure we can get an unambiguous interpretation with this model - if Wowereit was mayor of one town and moved to Berlin an became Mayor there, what does the blank node resolve to for provenance purposes. E.g. for RDF, with no sense of time, the interpretation of the blank node can change, but when we connect it to an entity and intend for the location of that entity to be fixed, we need a way to pick one answer.

The blank nodes correspond to existential variables, and as such are not 
expected to have an unambiguous interpretation.

So cf:e2 has some location whose leader was named "Klaus Wowereit", etc.  All 
that really matters is that the truth of the entire statement is fixed for the 
entity concerned.  If the entity moves with Klaus, the location may change but 
the truth does not.  (The entity concerned might be Klause himself?)

> I would have read this as e2 as having the location where Wowereit was the leader, which could have been Berlin for part of the time and some other location (Klaus has a favorite paperweight and he's kept it with him wherever he has been leader, and we want to record the provenance of that paperweight - as-a-symbol-of-office-in-his-leadership location so we created e2 to be that characterization.

Yes, quite!  (I should have read ahead in more detail!)

> If I wanted to have the notion that e2's location was Berlin (as a blank node), I would need add a time constraint - e.g. the location Wowereit led after he won the 200x election. That's an entity defined by its provenance and characterizing attributes that, in this example, would be a complement of Berlin corresponding to Berlin-as-led-by-Wowereit.

That's one way.  But as you imply, simply referring to an entity 
Berlin-as-led-by-Wowereit might be another.

> No sure how often anyone will want to do this in practice, but I suspect that for the original example and blank node cases, I think the only way to be unambiguous is to define the time range over which the plain RDF statements are asserted to be valid (or treat them as annotations who's meaning is subject to interpretation/out-of-band knowledge.) 'For the lifetime of the account' seems like a good default for that time range. And if you don't want that meaning, you create entities as in the next quoted section below.

Rather than "lifetime of the account", why not "Lifetime of the entity 
described."  Unless I've missed something, this discussion is entirely about 
statements made directly or indirectly about a prov:Entity, which by it's nature 
is completely static or somehow constrained in its existence.

#g
--

>>> I suggest that if you also want to lock down such things, then do
>>> those properties as other prov:Entities, (either anonymous or named):
>>>
>>> cf:e2 a prov:Entity ;
>>>     cf:hasLocation cf:berlinWithKlaus .
>>>
>>> cf:berlinWithKlaus a prov:Entity, prov:Location ;
>>>     prov:wasComplementOf  dbpedia:Berlin ;
>>>     dbpedia-owl:leader cf:klausTheMayor .
>>>
>>> cf:klausTheMayor a prov:Entity ;
>>>       prov:wasComplementOf dbpedia:Klaus_Wowereit ;
>>>       dbpedia:Klaus_Wowereit dbpprop:nationality dbpedia:Germany .
>>>
>>>
>>> Thus throughout the lifetime of cf:e2, the thing described by e2 was
>>> in Berlin, and throughout that time (at least as long as e2 existed)
>>> Klaus Wowereit was the leader, being German (The pre-1990
>>> Klaus-the-West-German was not the leader during the lifetime of
>>> cf:berlinWithKlaus).
>>>
>>>
>>> Note that such an interpretation would introduce temporal dependencies
>>> between cf:e2 and cf:klausTheMayor which are not currently covered by
>>> PROV-DM (there are no prov:derivedFrom or wasComplementOf links
>>> between cf:e2 and Berlin) - if the provenance otherwise showed that
>>> Klaus became mayor (when cf:klausTheMayor was generated) *afte*r cf:e2
>>> was generated, then the provenance account is inconsistent,  but this
>>> can't be shown by the constraints of PROV-DM as far as I can tell.
>>>
>>>
>>>
>>> Note that PROV-DM does not specifically allow such nesting of
>>> attribute values, there all attribute values are strings. If a
>>> property value was to be interpreted as a URI or identifier of another
>>> entity or other resource, than that seems outside of scope for PROV-DM
>>> - so we can take the same view in PROV-O.
>>>
>>>
>
Received on Friday, 21 October 2011 09:39:24 UTC