Re: Modified proposal for 'provenance triple', ISSUE-110

On Aug 30, 2011, at 07:25 , Gregg Kellogg wrote:

> On Aug 29, 2011, at 5:56 AM, Ivan Herman wrote:
> 
>> After our discussion and the last telco, and subsequent emails, I would like to modify the proposal.
>> 
>> Proposal: for each RDFa source, the processor graph should contain one triple of the sort
>> 
>> - subject: URI referring to the processor graph (typically <> in Turtle, or @about="" in RDF/XML, though implementation MAY define a specific URI for that purpose)
>> - predicate: http://www.w3.org/ns/rdfa#hasSource (see also discussion below)
>> - object: the initial value of the base URI, as defined in 7.2 of the RDFa Core document
> 
> Processor Graph? I thought we had discussed placing it in the default graph.

I am very sorry. Yes, I meant the default graph...

> 
> As I discussed before, <> or @about="" end up resolving to the document's IRI or html>head>base, as they describe relative IRIs. It seems that what we need is an empty IRI output, so that another processor encountering a serialization of the original document will see that the document at a new IRI continues to describe the original location. Consider the following:
> 
> <html>
>   <head>
>     <base href="http://example.org/original"/>
>   </head>
>   <body about="">
>     <p property="dc:title">Document Title</p>
>   </body>
> </html>
> 
> This will generate the following:
> 
> @base <http://example.org/original> .
> <> dc:title "Document Title" ; rdfa:hasSource <> .

Well... if this is the way you generate then of course there is an issue. But that is a serialization problem. On the RDF concept level there is no such thing as a relative URI, only absolute. Without the @base turtle directive, this code

<http://example.org/original> dc:title "Document Title" ; rdfa:hasSource <http://example.org/original> .

which is of course not what you would generate but, instead

<http://example.org/original> dc:title "Document Title" .
<> rdfa:hasSource <http://example.org/original> .

This just shows that the usage of @base _in the serialization_ might indeed be misleading.



> 
> What you might want instead would be the following:
> 
> <> rdfa:hasSource <http://example.org/original> .
> <http://example.org/original> dc:title "Document Title" .
> 
> The problem is, that as soon as the document is parsed, <> is given an actual URI (the base of the document being parsed), so I don't quite see how we accomplish this.
> 
>> I have chosen the simplest possible way for the predicate URI, namely to define one for ourselves, which may not be the best. Ideas that came up during the discussion
>> 
>> - powder:describedby : but is it correct that the RDF content 'describes' the HTML content? THat may not necessarily be the case, it may give additional data that is not in the HTML
>> 
>> - foaf:primaryTopic (Virtuoso seems to use that): "property relates a document to the main thing that the document is about.", says the foaf spec; this is, in my view, closer than powder:described by
> 
> I think this is most appropriate.

As I said, I am not 100% happy with this, but I can live with it:-)


Cheers

Ivan


> 
>> - dcterms has a provenance property, but its range is defined as a 'ProvenanceStatement', which would then create (via RDFS) an extra type information on the original data, and I do not think that is fine
>> 
>> - The provenance vocabulary (http://purl.org/net/provenance/ns#) also has some predicates but, just as dcterms, it contains a number of range specification that yields extra types on the original base URI. I am not sure that is o.k. If we disregard that, then prv:accessedResource is probably the best one[1], it generates a type information of 'internet Resource'[2], which is fairly harmless. The problem is whether prv is stable enough for a Rec, though.
>> 
>> - The draft of the provenance model of the Prov WG seems to have a hasOriginalSource predicate (in section 6.4), but I am not sure whether this is stable.
>> 
>> 
>> The stable thing is to use our own predicate, and maybe define a sub-property relationship later when the provenance WG's terms gel. Alternatively, we can ask the Prov WG for their advice. I can live with primaryTopic, but it does not feel _really_ right either. 
>> 
>> Ivan
>> 
>> 
>> 
>> [1] http://trdf.sourceforge.net/provenance/ns.html#accessedResource
>> [2] http://ontologydesignpatterns.org/ont/web/irw.owl#WebResource
>> [3] http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
>> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Tuesday, 30 August 2011 06:58:58 UTC