Re: Modified proposal for 'provenance triple', ISSUE-110

Ivan, Gregg,

I'm quite sure that Gregg is correct. Ivan, you say "URI referring to
the processor graph". But there is no predefined means of determining
the IRI for a graph *within* a graph. Any RDF format (including RDFa)
which only deals with triples has no means to even express what the
"containing graph" is (in the quad sense). You may of course express
information about the document (base) URI though.

Correct me if I'm wrong, but since the conceptual RDF model doesn't
include quads (only reification), it isn't even currently clear what
"graphs of graphs" are, apart from the instrumental approach taken by
e.g. SPARQL to express how you can store and query different contexts.

Anyway, there is no special meaning in RDF/XML to rdf:about="", in
Turtle to <>, nor in RDFa to about="" (or href="", resource=""). They
are syntactic mechanisms of expressing an empty relative IRI, which by
a processor turning this syntax into triples *must* (AFAIK) resolve
against the document base to produce an absolute IRI. All these
syntaxes have optional means of supplying this base, and processors
should by default use the URL (commonly a http or file URI), System ID
or similar, and also provide a means to programmatically supply the
base URI.

So I'm a bit lost here I'm afraid, as to what you mean with <>, Ivan,
if you *don't* mean the base URI.

.. The fact that RDFLib actually preserves URIRef("") as a kind of
"absolute relative reference" seems like a bug, or at most an esoteric
feature to preserve a syntactic form which doesn't represent any valid
RDF concept.

Now, I'm not saying that the topic itself is unimportant. I've dealt
with it a lot when storing data in quad stores -- regularly creating
named graphs based on input document URIs, and relating the named
graph IRI to this input source (with e.g. dc:source or
foaf:primaryTopic). In this way, a user of an RDFa processor may store
the resulting triples into a named graph within e.g. a quad store. And
if an RDF API supports named graphs (and graphs of graphs), the
resulting graph from an RDFa document can reasonably be named (with a
IRI) and a triple be added relating this named graph to the source
document IRI. But this mechanism of minting graph IRIs and adding data
about them (e.g. relating them to the source document(s)) is beyond
what RDFa should specify.

(It's not uncommon AFAIK to use the actual document IRI for this in
SPARQL, albeit this is logically conflating the document and the
graph.)

In any case, the RDFa syntax is a syntax for RDF triples, and not
quads, so it cannot express facts about the relationship (if any)
between a named graph and any of the resources described therein.
Neither should it. Named graphs and provenance is orthogonal to all
triple syntaxes, and should be kept separate from these.

Best regards,
Niklas



On Tue, Aug 30, 2011 at 8:58 AM, Ivan Herman <ivan@w3.org> wrote:
>
> On Aug 30, 2011, at 07:25 , Gregg Kellogg wrote:
>
>> On Aug 29, 2011, at 5:56 AM, Ivan Herman wrote:
>>
>>> After our discussion and the last telco, and subsequent emails, I would like to modify the proposal.
>>>
>>> Proposal: for each RDFa source, the processor graph should contain one triple of the sort
>>>
>>> - subject: URI referring to the processor graph (typically <> in Turtle, or @about="" in RDF/XML, though implementation MAY define a specific URI for that purpose)
>>> - predicate: http://www.w3.org/ns/rdfa#hasSource (see also discussion below)
>>> - object: the initial value of the base URI, as defined in 7.2 of the RDFa Core document
>>
>> Processor Graph? I thought we had discussed placing it in the default graph.
>
> I am very sorry. Yes, I meant the default graph...
>
>>
>> As I discussed before, <> or @about="" end up resolving to the document's IRI or html>head>base, as they describe relative IRIs. It seems that what we need is an empty IRI output, so that another processor encountering a serialization of the original document will see that the document at a new IRI continues to describe the original location. Consider the following:
>>
>> <html>
>>   <head>
>>     <base href="http://example.org/original"/>
>>   </head>
>>   <body about="">
>>     <p property="dc:title">Document Title</p>
>>   </body>
>> </html>
>>
>> This will generate the following:
>>
>> @base <http://example.org/original> .
>> <> dc:title "Document Title" ; rdfa:hasSource <> .
>
> Well... if this is the way you generate then of course there is an issue. But that is a serialization problem. On the RDF concept level there is no such thing as a relative URI, only absolute. Without the @base turtle directive, this code
>
> <http://example.org/original> dc:title "Document Title" ; rdfa:hasSource <http://example.org/original> .
>
> which is of course not what you would generate but, instead
>
> <http://example.org/original> dc:title "Document Title" .
> <> rdfa:hasSource <http://example.org/original> .
>
> This just shows that the usage of @base _in the serialization_ might indeed be misleading.
>
>
>
>>
>> What you might want instead would be the following:
>>
>> <> rdfa:hasSource <http://example.org/original> .
>> <http://example.org/original> dc:title "Document Title" .
>>
>> The problem is, that as soon as the document is parsed, <> is given an actual URI (the base of the document being parsed), so I don't quite see how we accomplish this.
>>
>>> I have chosen the simplest possible way for the predicate URI, namely to define one for ourselves, which may not be the best. Ideas that came up during the discussion
>>>
>>> - powder:describedby : but is it correct that the RDF content 'describes' the HTML content? THat may not necessarily be the case, it may give additional data that is not in the HTML
>>>
>>> - foaf:primaryTopic (Virtuoso seems to use that): "property relates a document to the main thing that the document is about.", says the foaf spec; this is, in my view, closer than powder:described by
>>
>> I think this is most appropriate.
>
> As I said, I am not 100% happy with this, but I can live with it:-)
>
>
> Cheers
>
> Ivan
>
>
>>
>>> - dcterms has a provenance property, but its range is defined as a 'ProvenanceStatement', which would then create (via RDFS) an extra type information on the original data, and I do not think that is fine
>>>
>>> - The provenance vocabulary (http://purl.org/net/provenance/ns#) also has some predicates but, just as dcterms, it contains a number of range specification that yields extra types on the original base URI. I am not sure that is o.k. If we disregard that, then prv:accessedResource is probably the best one[1], it generates a type information of 'internet Resource'[2], which is fairly harmless. The problem is whether prv is stable enough for a Rec, though.
>>>
>>> - The draft of the provenance model of the Prov WG seems to have a hasOriginalSource predicate (in section 6.4), but I am not sure whether this is stable.
>>>
>>>
>>> The stable thing is to use our own predicate, and maybe define a sub-property relationship later when the provenance WG's terms gel. Alternatively, we can ask the Prov WG for their advice. I can live with primaryTopic, but it does not feel _really_ right either.
>>>
>>> Ivan
>>>
>>>
>>>
>>> [1] http://trdf.sourceforge.net/provenance/ns.html#accessedResource
>>> [2] http://ontologydesignpatterns.org/ont/web/irw.owl#WebResource
>>> [3] http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html
>>>
>>> ----
>>> Ivan Herman, W3C Semantic Web Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
>

Received on Thursday, 1 September 2011 12:07:06 UTC