- From: Sandro Hawke <sandro@w3.org>
- Date: Thu, 12 Apr 2012 11:34:39 -0400
- To: William Waites <wwaites@tardis.ed.ac.uk>
- Cc: andy.seaborne@epimorphics.com, public-rdf-wg@w3.org
On Wed, 2012-04-11 at 18:40 +0100, William Waites wrote: > On Wed, 11 Apr 2012 10:37:22 -0400, Sandro Hawke <sandro@w3.org> said: > > sandro> Put differently, as a test case: > sandro> > sandro> Trig Document 1 (D1): <u> { <a> <b> 1 } > sandro> > sandro> Trig Document 2 (D2): <u> { <a> <b> 2 } > sandro> > sandro> What is the merge/union of D1 and D2? > sandro> > sandro> It's not defined, when asked like this. We use > sandro> something Trig-Like but different: > sandro> > sandro> D1A <u> {+ <a> <b> 1 } D2A <u> {+ <a> <b> 2 } > sandro> > sandro> in which case the merge is: > sandro> > sandro> D3A <u> {+ <a> <b> 1,2 } > sandro> > sandro> ==or== > sandro> > sandro> D1B <u> {= <a> <b> 1 } D2B <u> {= <a> <b> 2 } in > sandro> > sandro> which case there is no merge; they are inconsistent. > > Reading some of the background discussion, talking about crawler dumps > and such, it seems to me there is quite a bit more information we > might want to carry around in the "header" of a trig document. In the 6.1 proposal, you can say whatever you want in the default graph. It can be used like a "header" that way. The key point here is that the default graph is asserted. > For example, if D1 was downloaded at time t1 and D2 at t2, one could > reasonably conclude that even with the + notation it is inappropriate > to merge them, D2 having superceded D1. It seems to me this kind of logic requires named DATASETS. You're reasoning about D1 and D2. I suggest we try to design things so we only need to worry about named GRAPHS. All the designs on http://www.w3.org/2011/rdf-wg/wiki/Graphs_Design_6.1/Crawler_Example do okay on this front. In every case, if you can conjoin the datasets, you'll be able to do whatever reasoning you want about cache expirations, etc, just by looking at data in the dataset that came out of the conjunction. I say "if" because only designs 1 and 3 are guaranteed to allow conjoining. With design 2, an attempt to conjoin will fail if one of the data sources returns different contents during the different crawls. But you still wont get incorrect results. > Or perhaps D1 comes from a reliable source and D2 comes from someone > whose data I'll use if I don't have anything better but otherwise I > wouldn't trust. So when combining the information I'll throw out the > second version. But perhaps I would nevertheless keep it around and do > a straight additive merge if I know the cardinality of <b> to be > greater than 1. > > My point is that combining data from different sources, or the same > source at different times, is likely to need to take into account more > than just the +/= hints. Some of this information can be in-band > (e.g. time, source) and some must necessarily be out of band (e.g. how > much I trust that source). I'd like that kind of reasoning to happen within datasets rather than across datasets. I think that's much of why we want datasets, so we can reason about trust and change in graphs, in a distributed way. If we just push the problem up to reasoning about datasets, we probably haven't gained anything. -- Sandro
Received on Thursday, 12 April 2012 15:34:59 UTC