Re: complete vs partial graph semantics

On Wed, 11 Apr 2012 10:37:22 -0400, Sandro Hawke <sandro@w3.org> said:

    sandro> Put differently, as a test case:
    sandro>
    sandro> Trig Document 1 (D1): <u> { <a> <b> 1 }
    sandro>
    sandro> Trig Document 2 (D2): <u> { <a> <b> 2 }
    sandro>
    sandro> What is the merge/union of D1 and D2?
    sandro>
    sandro> It's not defined, when asked like this.  We use
    sandro> something Trig-Like but different:
    sandro>
    sandro>     D1A <u> {+ <a> <b> 1 } D2A <u> {+ <a> <b> 2 }
    sandro>
    sandro> in which case the merge is:
    sandro>
    sandro>     D3A <u> {+ <a> <b> 1,2 }
    sandro>
    sandro>         ==or==
    sandro>
    sandro>     D1B <u> {= <a> <b> 1 } D2B <u> {= <a> <b> 2 } in
    sandro>
    sandro> which case there is no merge; they are inconsistent.

Reading some of the background discussion, talking about crawler dumps
and such, it seems to me there is quite a bit more information we
might want to carry around in the "header" of a trig document.

For example, if D1 was downloaded at time t1 and D2 at t2, one could
reasonably conclude that even with the + notation it is inappropriate
to merge them, D2 having superceded D1.

Or perhaps D1 comes from a reliable source and D2 comes from someone
whose data I'll use if I don't have anything better but otherwise I
wouldn't trust. So when combining the information I'll throw out the
second version. But perhaps I would nevertheless keep it around and do
a straight additive merge if I know the cardinality of <b> to be
greater than 1.

My point is that combining data from different sources, or the same
source at different times, is likely to need to take into account more
than just the +/= hints. Some of this information can be in-band
(e.g. time, source) and some must necessarily be out of band (e.g. how
much I trust that source).

Cheers,
-w

Received on Wednesday, 11 April 2012 17:41:11 UTC