Re: Provenance as a first-class citizen

4-tuples? 3-triples? With this line of reasoning one could argue for
11-tuples at some point, an infinite amount of tuples. Perhaps we should
remember the good old saying that's the only real triple we need is
zero, one, and infinity :)

You could argue the reverse as well - that triples are *too much.* For
example, I'm working on a text-2-RDF project, and I want to capture the
sentence "John runs", which as predicate arguments is
usually"Run(John)". It seems there's only 2 things going on, so I could
argue for 2-tuples as a better representation. Or I could just throw a
blank node as the object of the triple. Overall, it seems like triples
are a pretty good intuitive "sweet-spot".

However, I do think the real question people are asking is that since
quads are useful in some circumstances and widely implemented (and there
are circumstances where they're not that useful, even as Sandro points
out), shouldn't they be at some point ratified by a body like the W3C as
something that one can do and should do to solve certain classes of
problems? If data is moving around the Web, and more often that not as
soon as data leaves its trusted location, one reasonably want some
provenance. And so far quads seem the best solution out there, a
sensible - and optional! - addition to triples. Does this mean ditching
triple-based RDF? Of course not. It just means standardizing existing
best practice, which seems to be that in some cases quads are useful.


Sandro Hawke wrote:

>Ben Syverson wrote:
>>On Mar 17, 2006, at 11:04 AM, Garrett Wollman wrote:
>>>I'm certain that this has been said before by people better-informed
>>>than I, but the more I look at RDF the more certain I am that basing
>>>it on triples rather than 4-tuples was a serious mistake.
>>I agree 1000%. Using triples means that by default statements are  
>>trusted and not reified. It suggests a top-down approach, rather than  
>>a bottom-up one. This is one reason that tags/keywords are more  
>>appealing to people than the SW.
>I disagree.
>RDF is based on triples because triples are an excellent single building
>block for making arbitrary statements.
>For making statements about statements -- which you're talking about --
>you need something more complex, like quads or reification, but that's
>relatively rare (even if it's very interesting).
>Publishing statements as triples makes sense.  Whatever you want your
>web page to say, just put those statements on the page.  You shouldn't
>have to put on the page a statement that those statements are on the
>page and are true.  Say "The sky is blue", not "I am now telling you
>that the sky is blue."
>For reasoning about statements, yes, of course use quads.  When I
>harvest RDF data, of course I keep track of what web pages said what.
>But I don't usually need to re-publish that harvester data; that's like
>my web browser publishing my browsing history along with the browser
>cache.  There are applications where that's useful, sure, but it's
>hardly the main way data moves around the web.
>    -- sandro

Received on Friday, 17 March 2006 21:32:38 UTC