W3C home > Mailing lists > Public > public-prov-wg@w3.org > July 2011

Re: PROV-ISSUE-46 (where-is-D-in-provenance): Where do I find document D in provenance [Accessing and Querying Provenance]

From: Simon Miles <simon.miles@kcl.ac.uk>
Date: Fri, 29 Jul 2011 15:17:05 +0100
Message-ID: <CAKc1nHcprkhc3Ki574Hk=D6XusKiK_0DRYYC6y_kEs1JgupbOw@mail.gmail.com>
To: Provenance Working Group WG <public-prov-wg@w3.org>
Graham, Luc,

I would also vote that this is a real problem to be addressed (which
is why it is part of the scenario).

It does seem vitally important that the client has the identifier of
the bob/thing of which they want to find the provenance *exactly as
used in the provenance data they access*. Otherwise, they haven't
really accessed the provenance of anything at all, as it can't be
interpreted - it is just a block of data describing the past and not
something's provenance.

We could say that obtaining that identifier is out of scope, but I
can't see an argument for why an access proposal would say how a
provenance URI is obtained (e.g. embedded in the HTML), but not the
identifier of the thing as used in the provenance. The client needs

In the case that the client creates the HTML itself, then it doesn't
need the BOB-URI, but it quite possibly also doesn't need the
provenance URI - it is just a link to the storage it used to document
the page's generation. If we do say obtaining the identifier is out of
scope, then obtaining the provenance URI should also be out scope, and
the proposal should be very brief :-)

There are alternatives to embedding the bob/thing URI in its own data
content (e.g. HTML). We could require that, on resolving the
provenance URI, you obtain not just provenance data but also the URI
of the thing as used in that provenance. That could only work if every
provenance URI was unique to one thing/bob.


On 28 July 2011 15:54, Luc Moreau <L.Moreau@ecs.soton.ac.uk> wrote:
> I thought this was an *explicit* use case in the scenario crafted in Boston.
> Luc
> On 07/28/2011 03:47 PM, Graham Klyne wrote:
>> Given that we're here to create standards and supporting documents, I
>> think one of the key principles should be:  does it address a
>> sufficiently compelling need with sufficient simplicity that
>> developers will implement it?  And thus, the implementation really
>> does matter - I don't think it's reasonable to divorce implementation
>> concerns from the solution.
>> To my mind, we're in danger of solving a non-problem here. If someone
>> gives me a USB stick with an HTML file on it, why should I take notice
>> of metadata in the HTML file about its origin when it has just been
>> handed to me by someone I presumably trust?  Of course, you can always
>> find edge cases, but standardization isn't about solving edge cases
>> (except if you do security standards), but primarily about addressing
>> common cases, where the effort (i.e. cost) of standardization is
>> amortized by scale of usage.
>> That said, if there's a real consensus that this is a real problem
>> worth solving, then I'd suggest a simple defining a second link
>> relation type for the purpose.  That is lightweight enough that some
>> developers might just implement it even if they don't perceive much
>> value.
>> #g
>> --
>> Luc Moreau wrote:
>>> Let's look at the problem conceptually first, and agree on the
>>> principles,  and in a second phase, let's see how to implement this.
>>> Yes, I consider the case where we control the generation of the HTML.
>>> I think we MAY embed in the HTML
>>> - provenance-URI: the location for the provenance of this document
>>> - BOB-URI: the identifier of the BOB that represents this document
>>> Note 1: this may be BOB-URIs (since this document may be described by
>>> multiple BOBs)
>>> Note 2: this may be provenance-URIs (since there may be multiple
>>> sources for the provenance)
>>> If we are in agreement, we can look at ways of encoding this
>>> information.
>>> Luc
>>> On 07/28/2011 02:15 PM, Graham Klyne wrote:
>>>> In the general case, if you don't control the generation of the
>>>> HTML, it's the same problem as an image.  There's nothing more we
>>>> can do.
>>>> If you do control generation of the HTML, then <link> can give you
>>>> the provenance resource.
>>>> But I see that you may also need an identifier for the HTML to
>>>> interpret that provenance (and the rest of this response addresses
>>>> just that issue).
>>>> ...
>>>> I see two, maybe three possibilities:
>>>> (a) rely on some unspecified mechanism here - i.e. don't specify a
>>>> specific mechanism for this case.
>>>> (b) add something to the HTML to identify the resource it
>>>> represents. (Off the top of my head, this could be <meta>, <link>,
>>>> or RDFa - I'm sure there are other options.)
>>>> (c) adopt a packaging mechanism that can combine arbitrary data and
>>>> metadata.
>>>> (d) ... maybe something else.
>>>> I think the scenario alone isn't enough information to make a
>>>> sensible choice here - which to be useful has to be one that
>>>> developers will actually implement.
>>>> If I were forced to make a choice now, I'd go with (a) or maybe (b)
>>>> with a <link> element and a new link relation roughly for "self".
>>>> The packaging approach would solve more problems generally, but I
>>>> don't think we know enough to make a call on a specific mechanism
>>>> that would effectively promote interoperability, and there's enough
>>>> defined mechanism (cf.
>>>> http://dvcs.w3.org/hg/prov/raw-file/tip/paq/provenance-access.html#gap-analysis)
>>>> for developers to do something that would work right now.
>>>> #g
>>>> --
>>>> Luc Moreau wrote:
>>>>> Hi Graham,
>>>>> I guess that D7 is the case I was after.
>>>>> Note that D7 is not an image but an html file. Where do I find its
>>>>> identifier?
>>>>> Luc
>>>>> On 07/28/2011 11:26 AM, Graham Klyne wrote:
>>>>>> I've added a scenario analysis appendix to the PAQ document at
>>>>>> http://dvcs.w3.org/hg/prov/raw-file/be3b7e1f2518/paq/provenance-access.html
>>>>>> The short answer to this issue is that I believe there are some
>>>>>> matters that are  beyond the scope of a W3C specification
>>>>>> document.  The mechanisms described (or with placeholders for
>>>>>> fuller description) could form the basis for applications that
>>>>>> need to deal with, say, data provided on a USB drive, but a
>>>>>> complete specification would IMO be inappropriate.
>>>>>> e.g.
>>>>>> [[
>>>>>> S: this scenario effectively calls for this: given an arbitrary
>>>>>> data resource, implement a general purpose application to
>>>>>> discover, retrieve and analyze provenance about that resource. At
>>>>>> the present time, this is a matter for experimental development,
>>>>>> which could be based substantially on the mechanisms described for
>>>>>> provenance discovery and access via third party services.
>>>>>> ]]
> --
> Professor Luc Moreau
> Electronics and Computer Science   tel:   +44 23 8059 4487
> University of Southampton          fax:   +44 23 8059 2865
> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________

Dr Simon Miles
Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166
Received on Friday, 29 July 2011 14:17:33 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:50:58 UTC