Re: PROV-ISSUE-46 (where-is-D-in-provenance): Where do I find document D in provenance [Accessing and Querying Provenance]

I think I've seen enough push-back to agree this is a problem for which a 
proposal should be drafted.  If it turns out to be too complex, we can fall back 
to plan B (out of scope), but I'm hopeful a simple answer can be found.

I'll think about lightweight mechanisms to include the additional data.  I like 
the "anchor" URI possibility for the Link: header; it's a shame that is not an 
option for the <link> element.  I don't have firm ideas for this yet, and am 
seeking advice.

#g
--


Simon Miles wrote:
> Graham, Luc,
> 
> I would also vote that this is a real problem to be addressed (which
> is why it is part of the scenario).
> 
> It does seem vitally important that the client has the identifier of
> the bob/thing of which they want to find the provenance *exactly as
> used in the provenance data they access*. Otherwise, they haven't
> really accessed the provenance of anything at all, as it can't be
> interpreted - it is just a block of data describing the past and not
> something's provenance.
> 
> We could say that obtaining that identifier is out of scope, but I
> can't see an argument for why an access proposal would say how a
> provenance URI is obtained (e.g. embedded in the HTML), but not the
> identifier of the thing as used in the provenance. The client needs
> both.
> 
> In the case that the client creates the HTML itself, then it doesn't
> need the BOB-URI, but it quite possibly also doesn't need the
> provenance URI - it is just a link to the storage it used to document
> the page's generation. If we do say obtaining the identifier is out of
> scope, then obtaining the provenance URI should also be out scope, and
> the proposal should be very brief :-)
> 
> There are alternatives to embedding the bob/thing URI in its own data
> content (e.g. HTML). We could require that, on resolving the
> provenance URI, you obtain not just provenance data but also the URI
> of the thing as used in that provenance. That could only work if every
> provenance URI was unique to one thing/bob.
> 
> Thanks,
> Simon
> 
> On 28 July 2011 15:54, Luc Moreau <L.Moreau@ecs.soton.ac.uk> wrote:
>> I thought this was an *explicit* use case in the scenario crafted in Boston.
>>
>> Luc
>>
>> On 07/28/2011 03:47 PM, Graham Klyne wrote:
>>> Given that we're here to create standards and supporting documents, I
>>> think one of the key principles should be:  does it address a
>>> sufficiently compelling need with sufficient simplicity that
>>> developers will implement it?  And thus, the implementation really
>>> does matter - I don't think it's reasonable to divorce implementation
>>> concerns from the solution.
>>>
>>> To my mind, we're in danger of solving a non-problem here. If someone
>>> gives me a USB stick with an HTML file on it, why should I take notice
>>> of metadata in the HTML file about its origin when it has just been
>>> handed to me by someone I presumably trust?  Of course, you can always
>>> find edge cases, but standardization isn't about solving edge cases
>>> (except if you do security standards), but primarily about addressing
>>> common cases, where the effort (i.e. cost) of standardization is
>>> amortized by scale of usage.
>>>
>>> That said, if there's a real consensus that this is a real problem
>>> worth solving, then I'd suggest a simple defining a second link
>>> relation type for the purpose.  That is lightweight enough that some
>>> developers might just implement it even if they don't perceive much
>>> value.
>>>
>>> #g
>>> --
>>>
>>> Luc Moreau wrote:
>>>> Let's look at the problem conceptually first, and agree on the
>>>> principles,  and in a second phase, let's see how to implement this.
>>>>
>>>> Yes, I consider the case where we control the generation of the HTML.
>>>>
>>>> I think we MAY embed in the HTML
>>>> - provenance-URI: the location for the provenance of this document
>>>> - BOB-URI: the identifier of the BOB that represents this document
>>>>
>>>> Note 1: this may be BOB-URIs (since this document may be described by
>>>> multiple BOBs)
>>>> Note 2: this may be provenance-URIs (since there may be multiple
>>>> sources for the provenance)
>>>>
>>>> If we are in agreement, we can look at ways of encoding this
>>>> information.
>>>>
>>>> Luc
>>>>
>>>>
>>>> On 07/28/2011 02:15 PM, Graham Klyne wrote:
>>>>> In the general case, if you don't control the generation of the
>>>>> HTML, it's the same problem as an image.  There's nothing more we
>>>>> can do.
>>>>>
>>>>> If you do control generation of the HTML, then <link> can give you
>>>>> the provenance resource.
>>>>>
>>>>> But I see that you may also need an identifier for the HTML to
>>>>> interpret that provenance (and the rest of this response addresses
>>>>> just that issue).
>>>>>
>>>>> ...
>>>>>
>>>>> I see two, maybe three possibilities:
>>>>>
>>>>> (a) rely on some unspecified mechanism here - i.e. don't specify a
>>>>> specific mechanism for this case.
>>>>> (b) add something to the HTML to identify the resource it
>>>>> represents. (Off the top of my head, this could be <meta>, <link>,
>>>>> or RDFa - I'm sure there are other options.)
>>>>> (c) adopt a packaging mechanism that can combine arbitrary data and
>>>>> metadata.
>>>>> (d) ... maybe something else.
>>>>>
>>>>> I think the scenario alone isn't enough information to make a
>>>>> sensible choice here - which to be useful has to be one that
>>>>> developers will actually implement.
>>>>>
>>>>> If I were forced to make a choice now, I'd go with (a) or maybe (b)
>>>>> with a <link> element and a new link relation roughly for "self".
>>>>>
>>>>> The packaging approach would solve more problems generally, but I
>>>>> don't think we know enough to make a call on a specific mechanism
>>>>> that would effectively promote interoperability, and there's enough
>>>>> defined mechanism (cf.
>>>>> http://dvcs.w3.org/hg/prov/raw-file/tip/paq/provenance-access.html#gap-analysis)
>>>>> for developers to do something that would work right now.
>>>>>
>>>>> #g
>>>>> --
>>>>>
>>>>> Luc Moreau wrote:
>>>>>>
>>>>>> Hi Graham,
>>>>>>
>>>>>> I guess that D7 is the case I was after.
>>>>>> Note that D7 is not an image but an html file. Where do I find its
>>>>>> identifier?
>>>>>>
>>>>>> Luc
>>>>>>
>>>>>> On 07/28/2011 11:26 AM, Graham Klyne wrote:
>>>>>>> I've added a scenario analysis appendix to the PAQ document at
>>>>>>> http://dvcs.w3.org/hg/prov/raw-file/be3b7e1f2518/paq/provenance-access.html
>>>>>>>
>>>>>>>
>>>>>>> The short answer to this issue is that I believe there are some
>>>>>>> matters that are  beyond the scope of a W3C specification
>>>>>>> document.  The mechanisms described (or with placeholders for
>>>>>>> fuller description) could form the basis for applications that
>>>>>>> need to deal with, say, data provided on a USB drive, but a
>>>>>>> complete specification would IMO be inappropriate.
>>>>>>>
>>>>>>> e.g.
>>>>>>> [[
>>>>>>> S: this scenario effectively calls for this: given an arbitrary
>>>>>>> data resource, implement a general purpose application to
>>>>>>> discover, retrieve and analyze provenance about that resource. At
>>>>>>> the present time, this is a matter for experimental development,
>>>>>>> which could be based substantially on the mechanisms described for
>>>>>>> provenance discovery and access via third party services.
>>>>>>> ]]
>> --
>> Professor Luc Moreau
>> Electronics and Computer Science   tel:   +44 23 8059 4487
>> University of Southampton          fax:   +44 23 8059 2865
>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>>
>>
>>
>> ______________________________________________________________________
>> This email has been scanned by the MessageLabs Email Security System.
>> For more information please visit http://www.messagelabs.com/email
>> ______________________________________________________________________
>>
> 
> 
> 

Received on Thursday, 4 August 2011 08:41:53 UTC