Use case for g-snaps from Sandro Hawke on 2011-09-30 (public-rdf-prov@w3.org from September 2011)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 30 Sep 2011 00:52:18 -0400
To: Richard Cyganiak <richard@cyganiak.de>
Cc: public-rdf-prov@w3.org
Message-ID: <1317358338.5766.44.camel@waldron>
On Fri, 2011-09-23 at 20:19 +0100, Richard Cyganiak wrote:
> On 22 Sep 2011, at 20:02, Sandro Hawke wrote:
> >> Assuming the data is retrieved from the web (that is, all data is
> >> received as a representation of some resource that has a URL), then I
> >> believe that all these issues can be solved using three ingredients:
> >> 
> >> 1. a graph store for named graphs,
> >> 2. vocabulary for expressing authorship, trust/reputation and
> >> source/mirror relationships,
> >> 3. an incentive for parties on the web to publish trust/reputation
> >> information.
> > 
> > I didn't call this out in my examples, but how do you handle the cases where data changes?  How can I say that Errol got the name wrong, in a way which won't make me wrong if he corrects himself?
> 
> Can you expand a bit? Your use case mentioned the risk that Alice might be mislead into visiting the wrong restaurant because Errol misfiled his review under the wrong restaurant. You seem to have something more than that in mind. Someone comes in saying that Errol's review is misfiled? Then Errol fixes his review? What exactly is the full story?

Okay:

1.  Errol publishes some triples G1 at address U1

        Contents of http://errol.example.org/page1
        
                @prefix rr: <http://reviews.example.org#>.
                rr:mals rr:quality rr:great.
         
        (I'm just putting everything in rr: for simplicity for now.)

2.  Charlie publishes a commentary on G1 signaling agreement or
disagreement.   Something like this, where I'm skipping the details of
G1 for now:

        Contents of http://charlie.example.org/page1
        
                @prefix rr: <http://reviews.example.org#>.
                G1 rr:cameFrom <http://errol.example.org/page1>.
                G1 rr:agreement -0.95.

3.  Errol changes the triples at U1 to be G2.

Contents of http://errol.example.org/page1
        
        @prefix rr: <http://reviews.example.org#>.
        rr:mels rr:quality rr:great.
                 
4.  Alice comes along and sees Charlie's comment, that Charlie strongly
disagrees with something Errol published.   How can she tell they were
about G1 and not about G2?  

====

Is that clear enough?  

I see three kinds of answers:

SOLUTION A: Charlie Publishes a Copy of G1

        Charlie re-publishes G1 at http://charlie.example.org/page2 and
        in his page1 graph he just uses
        <http://charlie.example.org/page2> where I wrote G1.   Like
        this:


        @prefix rr: <http://reviews.example.org#>.
        <http://charlie.example.org/page2> 
                        rr:cameFrom <http://errol.example.org/page1>;
                rr:agreement -0.95.

        I've implemented this kind of thing, but it always makes me a
        bit nervous, because Charlie could change page2.   But I haven't
        come up with a case where that's a real threat that can't be
        dealt with by everyone always making copies like this.

        This seems quite inefficient, but it might not be as bad as the
        alternatives.
        
SOLUTION B: Charlie Embeds a Copy of G1

        Charlie uses a special syntax like N-Quads or TriG to embed G1
        in page1.  Something like this:

        @prefix rr: <http://reviews.example.org#>.
        _:G1 rr:cameFrom <http://errol.example.org/page1>.
        _:G1 rr:agreement -0.95
        _:G1 { rr:mals rr:quality rr:great. }.

        or maybe with some kind of a g-text like this:

        @prefix rr: <http://reviews.example.org#>.
        _:G1 rr:cameFrom <http://errol.example.org/page1>.
        _:G1 rr:agreement -0.95
        _:G1 rr:turtleText "@prefix rr: <http://reviews.example.org#>. <http://charlie.example.org/page2> rr:cameFrom <http://errol.example.org/page1>; rr:agreement -0.95.";

        
        (If we use a URL instead of a bnode for G1, then we're in the
        space of both SOLUTION-A and SOLUTION-B, since perhaps Alice
        could dereference it -- what if she didn't get the same thing as
        was embedded?) 
        
SOLUTION-C: Charlie Characterizes G1

        Maybe there's a way to know about Errol changing the graph
        without transmitting the graph.  For instance, Charlie might
        include a hash of the contents:

        @prefix rr: <http://reviews.example.org#>.
        _:G1 rr:cameFrom <http://errol.example.org/page1>.
        _:G1 rr:agreement -0.95.
        _:G1 rr:hashWhenFetched
"13ae3ec8f7c3b8f814ab8f1da9510ebdc0f8c740f1763f825429e9e8c3c21878". 

        (That really is the sha256 of G1, as formatting into that
        SOLUTION-B.  I hope.)

        A problem with this is that the bytes might change a lot more
        than the triples do.   And Alice might really want to know what
        those triples where.   Perhaps a timestamp could help, but then
        we might be trusting Errol too much.


It's not clear to me yet which parts of this are our domain to
standardize.    Certainly "{...}" or "turtleText" are.  Maybe "cameFrom"
and "hashWhenFetched".   Probably not "agreement", at least not in this
fuzzy form.

      -- Sandro
Received on Friday, 30 September 2011 04:52:28 UTC