Re: trusting quads

On 9 May 2012, at 13:18, Dan Brickley wrote:

> On 9 May 2012 22:03, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
>> On 09/05/12 20:02, Sandro Hawke wrote:
>>> On Wed, 2012-05-09 at 11:26 -0700, Steve Harris wrote:
>>>> Right. The whole reason quads were implemented was to be able to track
>>>> what *triples* appears in what documents (typically found on the web,
>>>> but file: is good too).
>>> 
>>> 
>>> Speak for yourself, please, Steve.   I've seen several implementations
>>> of quads that were used for other purposes and it's quite possible they
>>> predated yours.
>> 
>> 
>> Which systems?  I'd like to understand the motivations and approaches.
> 
> (This dispute feels like unhappy bickering (well, the Steve/Sandro
> interaction to name names).)

Fair.

> Triples don't need to be serialized into stable published Web
> documents for it to be worth talking about them as a unit; Steve's
> 'file:' comment acknowledges that, albeit in a serialization-centric
> way.
> 
> In the FOAF scene 2000-3 we wanted an extra slot for keeping track of
> who-said-what, right from the start. Without support for this in basic
> RDF tools it was a huge pain to hack at application level, e.g. see
> Edd Dumbill's writeup
> http://www.ibm.com/developerworks/xml/library/x-rdfprov/index.html

Sure, but when that was written it was already common to use the "4th slot" to track what file / URI / process / wherever you got the triples from. Some people did one ID per source, some one ID per triple, it wasn't until later that people tended to settle on one per source (and even now it's not universally true, e.g. the BigData store can still use one ID per triple, and people use that, I think).

> So - yes we wanted extra provenance info. Yes it was usually somehow
> associated with a Web document (at some level of abstraction --- could
> easily have been in-memory groups of triples too). And yes we wanted
> to associate those with other grouping constructs, eg. the human
> author.
> 
> Steve writes: """You say things like "without taking it as gospel"
> because your perspective is of some giant logic system. My perspective
> is of databases - I don't "believe" the things in my databases, it's
> all about the context. If you ask a user to enter their name, you
> don't "believe" the answer they give, you just store it. You can still
> query things you don't believe as long as you know the how / why / who
> says so. That's what the 4th slot was created for."""
> 
> I don't see that as incompatible with a logically-grounded view of
> what each bundle of triples is claiming.  At some point, is there some
> ground truth, re "as long as you know the how / why / who says so", or
> do we also admit possibility of different perspectives (and data
> quality, accuracy etc.) on that point?  Perhaps which triples are in
> which bundle is ground truth, but which bundle is associated with
> which real-world entity ... is something where we allow multiple
> perspectives, competing evidence, etc?

I wasn't trying to say it was incompatible, just it leads to different terminology, and apparently leads you to think that there's some fundamental difference between triples in the "default graph", and a "named graph".

My motivation was to explain why, if you want to pull in data from the wild, that literally expresses quads/graphs/whatever then you need quints to do it reliably and safely. Going down that route will lead to the obvious never-ending spiral of doom :)

As a trivial example, how do you replace the data with a new version when you recrawl, if you haven't tracked where each triple/quad came from.

-- 
Steve Harris, CTO
Garlik, a part of Experian 
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 653331 VAT # 887 1335 93
Registered office: Landmark House, Experian Way, NG2 Business Park, Nottingham, Nottinghamshire, England NG80 1ZZ

Received on Thursday, 10 May 2012 13:41:52 UTC