Re: PROV-ISSUE-46 (where-is-D-in-provenance): Where do I find document D in provenance [Accessing and Querying Provenance] from Paul Groth on 2011-07-28 (public-prov-wg@w3.org from July 2011)

From: Paul Groth <p.t.groth@vu.nl>
Date: Thu, 28 Jul 2011 13:52:20 +0200
To: Graham Klyne <GK@ninebynine.org>
CC: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <4E314D74.3040408@vu.nl>

Thanks for explaining. I understand the design approach now.

Another design choice could be to a specific "resource key" for 
accessing provenance. That is that there is a tight binding between the 
url and the provenance. Essentially, you can only connect provenance to 
a url that refers to a BOB. (e.g. provenance can only be associated with 
a permalink). This means that you always know what the provenance is 
referring to.

This makes the semantics of the connection between provenance and a url 
clearer. But I guess this has some downsides as well...

Paul




Graham Klyne wrote:
> Paul Groth wrote:
>> Hi Graham,
>>
>> I think you identified the crux of the matter in this paragraph:
>>
>>> The specific point raised about "where do I find a BOB assertion" is,
>>> I think, a
>>> matter for the model.  If one has located and obtained the provenance
>>> associated
>>> with a web resource, it's not the job of the access mechanism to
>>> figure out what
>>> it actually describes.  Personally, I think the whole sideshow about
>>> BOBs and
>>> suchlike is just a big unnecessary distraction, but the point about
>>> being clear
>>> about what is described by provenance remains important.
>> As far as I understand, you mean the PAQ just tells you where to find
>> some provenance information associated with a particular URL, correct?
>> where the association is rather loose. Then you have to use the model to
>> tease apart what the provenance information "actually" is about.
>
> Yes.  But I should temper that by noting that this is not inevitable, but it
> does represent a design choice.  In the web context, with its focus on a RESTful
> approach to application design, I think it is the appropriate choice.
>
>> For example, if several versions of the same document appeared at the
>> same url (i.e. the new york times homepage gets updated). Then to
>> determine that I would just say well the provenance is associated with
>> this url is over here. Then you I would look at the provenance
>> information and by knowing the model know what the current version is
>> and its provenance... or if it's really provenance about the ads on the
>> page, I would inspect the provenance information some more and be able
>> to figure that out.
>>
>> Is that correct?
>
> Broadly, that is a consistent approach, and I think it's a reasonable design
> choice.  Alternatively, if there is a "permalink" for the NYT homepage for
> today, then use that as the resource key for accessing provenance.  But, in any
> case, I think it would be a good design choice if the provenance itself
> (assuming RDF here) uses the more specific URI, and maybe also includes a
> statement to the effect that the URI used refers to a constrained form of (an
> "IVP") the resource identified by the "today" URI.
>
> If there's no "permalink", then in the provenance RDF one might introduce a
> blank node (or constructed URI) for the date-specific homepage and add the
> additional through appropriate additional RDF.  (This is part of the power we
> gain by using RDF rather than a provenance-specific metadata format.)
>
> If it's about the ads on the page, the same broadly applies, but would probably
> be better (IMO) if each ad already has its own URI.  In any case, we need to
> deal with real resources, so I think these design choices should be left open.
>
> #g
>
>

-- 
Dr. Paul Groth (p.t.groth@vu.nl)
http://www.few.vu.nl/~pgroth
Assistant Professor
Knowledge Representation & Reasoning Group
Artificial Intelligence Section
Department of Computer Science
VU University Amsterdam

Received on Thursday, 28 July 2011 11:55:08 UTC