Re: PROV-ISSUE-46 (where-is-D-in-provenance): Where do I find document D in provenance [Accessing and Querying Provenance] from Graham Klyne on 2011-07-28 (public-prov-wg@w3.org from July 2011)

From: Graham Klyne <GK@ninebynine.org>
Date: Thu, 28 Jul 2011 15:47:40 +0100
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
CC: public-prov-wg@w3.org
Message-ID: <4E31768C.8080307@ninebynine.org>
Given that we're here to create standards and supporting documents, I think one 
of the key principles should be:  does it address a sufficiently compelling need 
with sufficient simplicity that developers will implement it?  And thus, the 
implementation really does matter - I don't think it's reasonable to divorce 
implementation concerns from the solution.

To my mind, we're in danger of solving a non-problem here. If someone gives me a 
USB stick with an HTML file on it, why should I take notice of metadata in the 
HTML file about its origin when it has just been handed to me by someone I 
presumably trust?  Of course, you can always find edge cases, but 
standardization isn't about solving edge cases (except if you do security 
standards), but primarily about addressing common cases, where the effort (i.e. 
cost) of standardization is amortized by scale of usage.

That said, if there's a real consensus that this is a real problem worth 
solving, then I'd suggest a simple defining a second link relation type for the 
purpose.  That is lightweight enough that some developers might just implement 
it even if they don't perceive much value.

#g
--

Luc Moreau wrote:
> 
> Let's look at the problem conceptually first, and agree on the 
> principles,  and in a second phase, let's see how to implement this.
> 
> Yes, I consider the case where we control the generation of the HTML.
> 
> I think we MAY embed in the HTML
> - provenance-URI: the location for the provenance of this document
> - BOB-URI: the identifier of the BOB that represents this document
> 
> Note 1: this may be BOB-URIs (since this document may be described by 
> multiple BOBs)
> Note 2: this may be provenance-URIs (since there may be multiple sources 
> for the provenance)
> 
> If we are in agreement, we can look at ways of encoding this information.
> 
> Luc
> 
> 
> On 07/28/2011 02:15 PM, Graham Klyne wrote:
>> In the general case, if you don't control the generation of the HTML, 
>> it's the same problem as an image.  There's nothing more we can do.
>>
>> If you do control generation of the HTML, then <link> can give you the 
>> provenance resource.
>>
>> But I see that you may also need an identifier for the HTML to 
>> interpret that provenance (and the rest of this response addresses 
>> just that issue).
>>
>> ...
>>
>> I see two, maybe three possibilities:
>>
>> (a) rely on some unspecified mechanism here - i.e. don't specify a 
>> specific mechanism for this case.
>> (b) add something to the HTML to identify the resource it represents. 
>> (Off the top of my head, this could be <meta>, <link>, or RDFa - I'm 
>> sure there are other options.)
>> (c) adopt a packaging mechanism that can combine arbitrary data and 
>> metadata.
>> (d) ... maybe something else.
>>
>> I think the scenario alone isn't enough information to make a sensible 
>> choice here - which to be useful has to be one that developers will 
>> actually implement.
>>
>> If I were forced to make a choice now, I'd go with (a) or maybe (b) 
>> with a <link> element and a new link relation roughly for "self".
>>
>> The packaging approach would solve more problems generally, but I 
>> don't think we know enough to make a call on a specific mechanism that 
>> would effectively promote interoperability, and there's enough defined 
>> mechanism (cf. 
>> http://dvcs.w3.org/hg/prov/raw-file/tip/paq/provenance-access.html#gap-analysis) 
>> for developers to do something that would work right now.
>>
>> #g
>> -- 
>>
>> Luc Moreau wrote:
>>>
>>>
>>> Hi Graham,
>>>
>>> I guess that D7 is the case I was after.
>>> Note that D7 is not an image but an html file. Where do I find its 
>>> identifier?
>>>
>>> Luc
>>>
>>> On 07/28/2011 11:26 AM, Graham Klyne wrote:
>>>> I've added a scenario analysis appendix to the PAQ document at 
>>>> http://dvcs.w3.org/hg/prov/raw-file/be3b7e1f2518/paq/provenance-access.html 
>>>>
>>>>
>>>> The short answer to this issue is that I believe there are some 
>>>> matters that are  beyond the scope of a W3C specification document.  
>>>> The mechanisms described (or with placeholders for fuller 
>>>> description) could form the basis for applications that need to deal 
>>>> with, say, data provided on a USB drive, but a complete 
>>>> specification would IMO be inappropriate.
>>>>
>>>> e.g.
>>>> [[
>>>> S: this scenario effectively calls for this: given an arbitrary data 
>>>> resource, implement a general purpose application to discover, 
>>>> retrieve and analyze provenance about that resource. At the present 
>>>> time, this is a matter for experimental development, which could be 
>>>> based substantially on the mechanisms described for provenance 
>>>> discovery and access via third party services.
>>>> ]]
>>>
>>
>
Received on Thursday, 28 July 2011 14:48:51 UTC