- From: Sandro Hawke <sandro@w3.org>
- Date: Fri, 30 Sep 2011 08:10:34 -0400
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: public-rdf-prov@w3.org
<http://lists.w3.org/Archives/Public/public-rdf-prov/2011Sep/0022> rr:agreement 0.99. Below I expand on two points: - yes, I agree, let's only give URIs to g-boxes (not g-snaps) - how do we practically support static g-boxes? On Fri, 2011-09-30 at 11:11 +0200, Richard Cyganiak wrote: > Sandro, > > On 30 Sep 2011, at 06:52, Sandro Hawke wrote: > > SOLUTION A: Charlie Publishes a Copy of G1 > > That's a perfectly workable solution AFAICT. > > > I've implemented this kind of thing, but it always makes me a > > bit nervous, because Charlie could change page2. > > In SOLUTION B, Charlie could change the embedded graph literal in just the same way. This is a shared limitation between SOLUTION A and SOLUTION B. It's a problem only if Alice doesn't trust Charlie. In that case we'd need a trusted third party that takes snapshots of things on the web (like the Wayback Machine, but perhaps with snapshotting on demand). On the named graphs level, that solution looks exactly the same except that it puts page2 on a different domain. > > > This seems quite inefficient, but it might not be as bad as the > > alternatives. > > This inefficiency is again shared with SOLUTION B. SOLUTION B requires making a copy of G1 too. SOLUTION A requires an extra HTTP request, SOLUTION B bloats Charlie's graph. Their relative efficiency depends on the size of G1. SOLUTION A is more efficient than SOLUTION B if G1 is large. > > The inefficiency in SOLUTION A can be avoided if Charlie publishes a timestamp and/or hash for G1, as you describe in SOLUTION C. > > > SOLUTION-C: Charlie Characterizes G1 > > > > Maybe there's a way to know about Errol changing the graph > > without transmitting the graph. For instance, Charlie might > > include a hash of the contents: > > I'd say that hashes, timestamps and so on are clearly out of scope for RDF-WG. > > > It's not clear to me yet which parts of this are our domain to > > standardize. Certainly "{...}" or "turtleText" are. > > Those would be in scope for RDF-WG. So, I think the way we're talking here, making SOLN-A and SOLN-B be very close parallels, differing only in whether the contents are in-line or out-of-band, ... I think that means the Graph identifier part of the formats supporting nice in-line syntaxes (eg TriG) is really identifying a g-box. So the other triples, referring to Errol's statement, don't have to change when one switches between SOLN-A and SOLN-B. Under normal operation, it would be equivalent to say: 1. In Turtle: <http://example.org/foo> <p> <o>. while at http://example.org/foo is the Turtle: <a> <b> <c>. or 2. In something like TriG: <http://example.org/foo> <p> <o>. <http://example.org/foo> { <a> <b> <c>. } I'm thinking that's a simple and workable approach. I'm not sure if that's what you were proposing or not. This means Charlie can't exactly say, "I agree with Error's RDF graph (g-snap) G1" because he can't make an identifier for the G-snap G1; instead he says "I agree with Error's RDF graph which I have copied to this g-box, <foo>". I think I like that design -- never having g-snaps identified directly -- so people have less to get confused about. It's like a programming language that always passes by value, never by reference, so there's less confusion. We can't get rid of g-boxes -- those are files with RDF in them -- so let's get rid of (direct) g-snaps. I guess it's also like how people don't generally make up names for numbers. They either serialize the number, or give a name to a slot that holds a number that might be edited (eg "the world population in 2000"). It wonder if there is a parallel to Pi or e -- a few particular RDF graphs to which it would be good to give standard identifiers. So, I guess I'm with you on not having a mechanism for directly attaching URIs to g-snaps. People can attach them to g-boxes, and if they are confident it wont change, they can just think of it as a g-snap. Hmmmm. > > Maybe "cameFrom" > > and "hashWhenFetched". Probably not "agreement", at least not in this > > fuzzy form. > > None of those are in scope for RDF-WG. They are in scope for PROV-WG. Sounds reasonable to me. > (Another point regarding your use case: Errol shouldn't have fixed his mistake in place, but deleted the old assertions and published a corrected account under a new address. The latter should be considered best practice in situations of this kind. We can't really expect Charlie to do extra work to ensure that Errol can fix the mistake in place – the incentives are not right. His motive is probably only to prevent Alice from making a poor decision based on Errol's disinformation, not to protect Errol's reputation.) Excellent point. But surely there are RDF documents on the web that are going to be changing in place, like people's foaf files.... How would you allow that? One approach is like W3C TRs -- there's a "latest URI", where the contents changes, and a new "snapshot URI" every time the contents change. (And old snapshots can be deleted to save space whenever you want.) I think this is a good practice, but can we really ask everyone with a foaf file to follow it...? Maybe.... Yeah..... I've never implemented it, but I've often thought about making snapshot URIs include a secure hash of the contents. So Errol would publish his first statement at: http://errol.example.org/check-sha/13ae3ec8f7c3b8f814ab8f1da9510ebdc0f8c740f1763f825429e9e8c3c21878 and Charlie would copy it over to http://charlie.example.org/check-sha/13ae3ec8f7c3b8f814ab8f1da9510ebdc0f8c740f1763f825429e9e8c3c21878 Here I'm suggesting "check-sha" would signal to receivers that they SHOULD confirm the contents. That means they wouldn't have to trust Errol or Charlie not to maliciously or accidentally change things. It would essentially force people to follow the practice of making a new URI every time they want to change the contents. This would not allow content negotiation on snapshots, although it could still be used on the "latest version" so maybe that's okay. Con-neg on the latest version could pass along the snapshot URI for that content-type. It's also doing unauthorized URI inspection; I suppose we could fix that by making it be .well-known/check-sha. I bet we'd get into an interesting conversation with some IETF folks over that. :-) There may be a way to integrate this with Memento; I don't remember how it works, exactly. /me goes back and rereads http://www.w3.org/2003/08/introhash/v2 which is a little dated but still cool. :-) Something like that might be good for folks who want a secure latest-version URI, but it's probably too complicated for the current deployment environment. -- Sandro
Received on Friday, 30 September 2011 12:10:42 UTC