Re: Provenance as a first-class citizen

>> For making statements about statements -- which you're talking about 
>> --
>> you need something more complex, like quads or reification, but that's
>> relatively rare (even if it's very interesting).
>
> Yes, and that's precisely the problem. It shouldn't be rare; *all* 
> statements should be either implicitly or explicitly qualified via 
> quads/reification.
>
> The web is mess of a trillion viewpoints -- the current SW model is 
> equivalent to getting an RSS feed of articles from all over the web, 
> with no reference to the original sources. A human can look at an 
> article titled "Aliens Impregnate Brad Pitt" and know to ignore it 
> when determining facts about Brad Pitt, but the Semantic Web has no 
> such capability.
>
> It doesn't matter that you *can* reify statements within RDF. My wish 
> is that you wouldn't be able to *avoid* making reified statements.

I'm going to post a quick example of why one way of interpreting what 
you're saying doesn't seem to make sense, and then query whether you 
meant it that way.

All *published* RDF should include a statement of where the triples 
came from, e.g. I post a quad saying <X Y Z> was stated on example.com. 
But why do you trust me when *I* say that? It could be libel. (Or, in 
this case, stupid, since example.com doesn't exist.)

So you now need to recall who said that example.com said <X Y Z>. Now 
what when you publish that? You need more than a quad. And so on, ad 
infinitum.

*Or* you could just publish triples that *you* trust (stamping your 
authority to them), and remember *internally* who said what. Using 
whatever trust system you like. In the rare case that you want to 
publish your trust system, use an RDF graph to describe it: since it 
describes your trust system, you trust the graph *as a whole*, and 
hence can stamp your authority on it. Alternatively, publish the URL 
you got untrusted data from, so people who don't trust *you* can 
download the metadata themselves, and assign their own trust metric. 
Trying to move quads to the distribution system doesn't add anything, 
since "most of the time", you will be publishing metadata which you 
created, or which you wish to add your authority to.

So: are you stating that all **published** RDF should be qualified? 
Because that doesn't seem to make sense: it merely generates an 
infinity of pointless trust metadata. Or are you stating that all RDF 
**you store** should be qualified?

Chris

Received on Friday, 17 March 2006 23:42:54 UTC