Provenance context URIs for RDF data (was: PAQ document update, target renamed as context)

Olaf,

On 21/08/2011 17:15, Olaf Hartig wrote:
>>> --Section 3.3--
>>> *) For prov:hasProvenance triples I still don't understand how the
>>> subject is associated to the set of RDF triples that contains the
>>> corresponding prov:hasProvenance triple. To put it differently, what URI
>>> do I as a publisher use in the subject position of a prov:hasProvenance
>>> triple if I want to say that the object resource represents provenance
>>> information about that very set of triples which currently represent the
>>> resource in question.
>>
>> You use the URI of the containing RDF.
>
> What exactly is this URI?

Short answer: whatever the publisher decides it to be.

> Let's use the following example to clarify my confusion: In order to retrieve
> data about
>
>     <http://dbpedia.org/resource/Berlin>
>
> I retrieve a representation of the Web resource identified by URI
>
>    <http://dbpedia.org/data/Berlin>
>
> I parse this _representation_ and obtain some RDF triples. Obviously, this set
> might be different today than it was yesterday, because the data in DBpedia
> changes and, thus, I get different representations. Now, my question is, what
> URI should the DBpedia guys use as subject in a prov:hasProvenance triple that
> may occur in the representation served today, if they want to refer to
> provenance information about _today's_ data about Berlin? It cannot be the URI
>
>    <http://dbpedia.org/data/Berlin>
>
> because tomorrow that URI might not identify today's data about Berlin
> anymore.
>
> (This discussion is in some sense related to ISSUE-68)

So, in this case, the DBpedia team need to make some decisions about naming. 
They might, for example, follow something akin to the W3CF model and mint URIs 
of the form

  <http://dbpedia.org/data/2011/Berlin-20110825>

Or they might use a Memento-style (http://www.mementoweb.org/guide/quick-intro/) URI

  <http://arxiv.example.net/web/20110825/http://dbpedia.org/data/Berlin>

Or maybe it's time to dust off Larry Masinter's duri proposal: 
http://tools.ietf.org/id/draft-masinter-dated-uri-08.html

My main point is that it's not for the specification to say what the URI should 
be.  It just assumes there is one.  If there is no such URI available, then for 
the purposes of RDF (the subject of this section) other strategies might be 
considered (e.g. blank nodes with inverse functional properties), but I think 
that's getting beyond the scope of what we should cover here.

>> For RDF documents, this is sometimes written as an empty URI-reference; e.g.
>>
>>     <rdf:Description rdf:about="">
>>       <prov:hasProvenance rdf:resource="(provenance_URI)"/>
>>     </rdf:Description>
>
> No. At least not exactly ;-)  The subject of the RDF triple encoded in this
> RDF/XML snippet is the base URI (without any fragment part) of the
> corresponding RDF/XML document (see Sec.5.3 in the RDF/XML spec [1]).
> If such a base URI is not explicitly defined in the document, then the rules
> from Sec.4.1 of the XML Base spec [2] apply. For the DBpedia example this
> means that the base URI is
>
>    <http://dbpedia.org/data/Berlin>
>
> because DBpedia serves RDF/XML serializations without an xml:base attribute.
> As mentioned before, I wouldn't consider that URI suitable as the subject of
> prov:hasProvenance triples.

Well, yes, in this case I was assuming a static resource.  For a dynamic 
resource, the publisher needs to consider ways to refer to a specific instance. 
  These could then be encoded via xml:base to preserve the <>
idiom, or just use the selected URI within the RDF data.

>> (If publishing the RDF in a named graph, then use the URI of the graph.)
>
> I would agree here iff everybody would understand Named Graphs as immutable
> set of RDF triples (which is not the case, I guess).

It's the same situation as the non-named-graph resource.  It's up to the 
publisher to use and apply a URI that refers to a suitably invariant form. 
Provenance statements are only true to the extent that they refer top invariant 
aspects of the resource.  We can't stop people publishing untruths (either by 
omission or commission), even with regard to provenance :)

#g

Received on Thursday, 25 August 2011 13:52:15 UTC