Re: PROV-ISSUE-1 (define-resource): Definition for concept 'Resource' [Provenance Terminology] from Simon Miles on 2011-05-24 (public-prov-wg@w3.org from May 2011)

From: Simon Miles <simon.miles@kcl.ac.uk>
Date: Tue, 24 May 2011 22:36:06 +0100
To: public-prov-wg@w3.org
Message-ID: <BANLkTikZyM=38YiPRP20MoNthAL2DfFupQ@mail.gmail.com>
Hello,

With regard to the points raised on resources, in brief I suggest:
 - For our purposes, a resource is anything which can be referred to
and has a provenance.
 - This is equivalent to "anything that might be identified by a URI"
anyway, so it seems sensible to use that existing definition.
 - When we talk about the provenance of a resource, we mean the
provenance of its state on asking the question.
 - When we talk about the provenance of a resource state
representation, we mean the provenance of its state plus how it came
to be in that representation.
 - We would expect implementers of the recommendation to provide
access to the provenance of a web resource state representation, but
by the suggestions above this would anyway be the provenance of the
resource state (just by ignoring the portion specifically relating to
representation), and that state's provenance is equivalent to the
resource's provenance.

In less brief, the reasons for the suggestions above:

It seems intuitive to me that what a user, or a client on their
behalf, would ask for or expect is the provenance of a resource (in
the web architecture sense, (a) in Luc's list). As this might be
mutable, and so does not have one history over time, it makes sense to
me to specify that the provenance of a resource is the provenance of
its state on asking the question.

I agree with Jun that it would be good to include non-web resources,
but then agree with Paul that the web architecture definition captures
all we would want, just expressed in a way which is unusual for
non-web settings. If we accept the above suggestion that a "resource"
is what we'd ask for the provenance of, then surely all we mean by
resource is something which can be referred to and which has a
provenance? If so, then I think "might be identified with a URI" is
one way of describing this - else, what could be referred to but could
not be identified with URI? and what could be identified but does not
have a provenance?

With regards to (a) resource, (b) state and (c) representation, I
think it makes sense to talk about the provenance of any of the three.
Taking Graham's example, if (a) is the zebra's health, (b) is the
zebra's health at some point in time, and (c) is a medical record
about the zebra's health, I can envisage a meaningful response to
asking the history of the zebra's health (a), how its health came to
be as it is now (b) which is effectively the same as (a), or why the
record contains what it does (c). For the purposes of provenance, it
seems that (c) is just (b) with a bit of extra information (details of
the particular representation) and so the provenance of (c) is just
the provenance of (b) plus some extra (ignorable) information on how
it can to be represented as it is.

Graham - I don't understand your argument for why a web resource
state's ((b)'s) provenance would not be meaningful. The provenance of
the government data at the time it was first published, for example,
would refer to the studies which produced it, while the provenance of
its Turtle representation would be the same plus information about
serialisation in Turtle.

In a mail to this list which I think got lost, I said that in the
government example I didn't understand the difference between f1 being
"published" and r1 being "made available as a web resource", so I'm
not clear enough on the difference between f1 and r1 to use to
illustrate the suggestions above.

Thanks,
Simon

On 24 May 2011 21:13, Graham Klyne <GK@ninebynine.org> wrote:
> Hi Luc,
>
> Trimming the message this time!
>
> Luc Moreau wrote:
>  >(I wrote):
>>> I don't think there's a need or purpose to invoke that terminology here.
>>>
>>> Just consider, for the sake of discussion, a slight revision of the
>>> example:
>>>
>>> government (gov) converts data (d1) to XML (f1) at time (t1)
>>> government (gov) generates provenance information (prov) regarding XML
>>> (f1)
>>> government (gov) publishes XML data (f1) along with its provenance
>>> (prov) on a portal with a license (li1); the XML data is now available
>>> as a Web resource (r1)
>>>  :
>>>
>>> I think the example makes just as much sense with RDF replaced by XML,
>>> but the RDF terminology does not apply to XML data.  And, by the way,
>>> I think this revised example also represents a use-case that we MUST
>>> be able to support (except that instead of talking about Turle and
>>> RDF/XML serializations, we might talk about text/XML vs EXI
>>> (http://www.w3.org/TR/2011/REC-exi-20110310/) serializations.
>>
>> I agree that it could be xml.  But the problem is still the same.
>> THe web architecture distinguishes
>> - resource
>> - resource state
>> - resource state representation
>>
>> The rdf WG has introduced terminology for rdf corresponding to these
>> concepts.
>>
>> If we want to explain how provenance fits into the web architecture, we
>> need to be able
>> to refer to these notions.
>
> OK, I see two discussion points here:
>
> (a) the relevance of the RDF g-box, g-snap, g-text terminology, and
>
> (b) the need to express provenance about resources/resource state/resource state
> representation
>
> Regarding (a), I think the (resources/resource state/resource state
> representation) terminology is perfectly adequate for our current purposes, and
> that avoids getting drawn into RDF-specific issues of RDF graph evolution.
> Later, when we (maybe) discuss more specifically management of provenance
> expressed using RDF, I can imagine the g-box/... terminology might be helpful.
>
> Regarding (b), I've offered a viewpoint, but I remain open to persuasion.  But I
> don't think focusing on the g-box/g-snap/g-text is going to help us here,
> because the Web Architecture concepts are so much broader (i.e. not just RDF).
> More important, IMO, is to identify a specific scenario that isn't adequately or
> so easily handled by the provenance-of-resource case.
>
> #g
> --
>
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>



-- 
Dr Simon Miles
Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166
Received on Tuesday, 24 May 2011 21:36:34 UTC