- From: Sandro Hawke <sandro@w3.org>
- Date: Fri, 30 Sep 2011 00:52:18 -0400
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: public-rdf-prov@w3.org
On Fri, 2011-09-23 at 20:19 +0100, Richard Cyganiak wrote:
> On 22 Sep 2011, at 20:02, Sandro Hawke wrote:
> >> Assuming the data is retrieved from the web (that is, all data is
> >> received as a representation of some resource that has a URL), then I
> >> believe that all these issues can be solved using three ingredients:
> >>
> >> 1. a graph store for named graphs,
> >> 2. vocabulary for expressing authorship, trust/reputation and
> >> source/mirror relationships,
> >> 3. an incentive for parties on the web to publish trust/reputation
> >> information.
> >
> > I didn't call this out in my examples, but how do you handle the cases where data changes? How can I say that Errol got the name wrong, in a way which won't make me wrong if he corrects himself?
>
> Can you expand a bit? Your use case mentioned the risk that Alice might be mislead into visiting the wrong restaurant because Errol misfiled his review under the wrong restaurant. You seem to have something more than that in mind. Someone comes in saying that Errol's review is misfiled? Then Errol fixes his review? What exactly is the full story?
Okay:
1. Errol publishes some triples G1 at address U1
Contents of http://errol.example.org/page1
@prefix rr: <http://reviews.example.org#>.
rr:mals rr:quality rr:great.
(I'm just putting everything in rr: for simplicity for now.)
2. Charlie publishes a commentary on G1 signaling agreement or
disagreement. Something like this, where I'm skipping the details of
G1 for now:
Contents of http://charlie.example.org/page1
@prefix rr: <http://reviews.example.org#>.
G1 rr:cameFrom <http://errol.example.org/page1>.
G1 rr:agreement -0.95.
3. Errol changes the triples at U1 to be G2.
Contents of http://errol.example.org/page1
@prefix rr: <http://reviews.example.org#>.
rr:mels rr:quality rr:great.
4. Alice comes along and sees Charlie's comment, that Charlie strongly
disagrees with something Errol published. How can she tell they were
about G1 and not about G2?
====
Is that clear enough?
I see three kinds of answers:
SOLUTION A: Charlie Publishes a Copy of G1
Charlie re-publishes G1 at http://charlie.example.org/page2 and
in his page1 graph he just uses
<http://charlie.example.org/page2> where I wrote G1. Like
this:
@prefix rr: <http://reviews.example.org#>.
<http://charlie.example.org/page2>
rr:cameFrom <http://errol.example.org/page1>;
rr:agreement -0.95.
I've implemented this kind of thing, but it always makes me a
bit nervous, because Charlie could change page2. But I haven't
come up with a case where that's a real threat that can't be
dealt with by everyone always making copies like this.
This seems quite inefficient, but it might not be as bad as the
alternatives.
SOLUTION B: Charlie Embeds a Copy of G1
Charlie uses a special syntax like N-Quads or TriG to embed G1
in page1. Something like this:
@prefix rr: <http://reviews.example.org#>.
_:G1 rr:cameFrom <http://errol.example.org/page1>.
_:G1 rr:agreement -0.95
_:G1 { rr:mals rr:quality rr:great. }.
or maybe with some kind of a g-text like this:
@prefix rr: <http://reviews.example.org#>.
_:G1 rr:cameFrom <http://errol.example.org/page1>.
_:G1 rr:agreement -0.95
_:G1 rr:turtleText "@prefix rr: <http://reviews.example.org#>. <http://charlie.example.org/page2> rr:cameFrom <http://errol.example.org/page1>; rr:agreement -0.95.";
(If we use a URL instead of a bnode for G1, then we're in the
space of both SOLUTION-A and SOLUTION-B, since perhaps Alice
could dereference it -- what if she didn't get the same thing as
was embedded?)
SOLUTION-C: Charlie Characterizes G1
Maybe there's a way to know about Errol changing the graph
without transmitting the graph. For instance, Charlie might
include a hash of the contents:
@prefix rr: <http://reviews.example.org#>.
_:G1 rr:cameFrom <http://errol.example.org/page1>.
_:G1 rr:agreement -0.95.
_:G1 rr:hashWhenFetched
"13ae3ec8f7c3b8f814ab8f1da9510ebdc0f8c740f1763f825429e9e8c3c21878".
(That really is the sha256 of G1, as formatting into that
SOLUTION-B. I hope.)
A problem with this is that the bytes might change a lot more
than the triples do. And Alice might really want to know what
those triples where. Perhaps a timestamp could help, but then
we might be trusting Errol too much.
It's not clear to me yet which parts of this are our domain to
standardize. Certainly "{...}" or "turtleText" are. Maybe "cameFrom"
and "hashWhenFetched". Probably not "agreement", at least not in this
fuzzy form.
-- Sandro
Received on Friday, 30 September 2011 04:52:28 UTC