- From: Luc Moreau <l.moreau@ecs.soton.ac.uk>
- Date: Fri, 30 Sep 2011 13:29:37 +0100
- To: public-rdf-prov@w3.org
Hi Sandro, This discussion has become quite technical, and I am not sure I understand all the implications. I am surprised by your suggestion not to name g-snaps. What I liked about the distinction between g-snap/g-box is that it allowed me to talk about the content and the container. From a provenance perspective, we may want to say different things about them. The content was generated by an rdb2rdf converter launched by Luc, whereas the container is this rdf file, which Sandro stored on his local disk. From a provenance viewpoint, we require the thing we talk about in the provenance to be identifiable. With URI-less g-snaps, this is going to become more challenging. I may also have misunderstood the g-concepts. What do you think? Cheers, Luc On 30/09/2011 13:10, Sandro Hawke wrote: > <http://lists.w3.org/Archives/Public/public-rdf-prov/2011Sep/0022> > rr:agreement 0.99. > > Below I expand on two points: > > - yes, I agree, let's only give URIs to g-boxes (not g-snaps) > - how do we practically support static g-boxes? > > On Fri, 2011-09-30 at 11:11 +0200, Richard Cyganiak wrote: > >> Sandro, >> >> On 30 Sep 2011, at 06:52, Sandro Hawke wrote: >> >>> SOLUTION A: Charlie Publishes a Copy of G1 >>> >> That's a perfectly workable solution AFAICT. >> >> >>> I've implemented this kind of thing, but it always makes me a >>> bit nervous, because Charlie could change page2. >>> >> In SOLUTION B, Charlie could change the embedded graph literal in just the same way. This is a shared limitation between SOLUTION A and SOLUTION B. It's a problem only if Alice doesn't trust Charlie. In that case we'd need a trusted third party that takes snapshots of things on the web (like the Wayback Machine, but perhaps with snapshotting on demand). On the named graphs level, that solution looks exactly the same except that it puts page2 on a different domain. >> >> >>> This seems quite inefficient, but it might not be as bad as the >>> alternatives. >>> >> This inefficiency is again shared with SOLUTION B. SOLUTION B requires making a copy of G1 too. SOLUTION A requires an extra HTTP request, SOLUTION B bloats Charlie's graph. Their relative efficiency depends on the size of G1. SOLUTION A is more efficient than SOLUTION B if G1 is large. >> >> The inefficiency in SOLUTION A can be avoided if Charlie publishes a timestamp and/or hash for G1, as you describe in SOLUTION C. >> >> >>> SOLUTION-C: Charlie Characterizes G1 >>> >>> Maybe there's a way to know about Errol changing the graph >>> without transmitting the graph. For instance, Charlie might >>> include a hash of the contents: >>> >> I'd say that hashes, timestamps and so on are clearly out of scope for RDF-WG. >> >> >>> It's not clear to me yet which parts of this are our domain to >>> standardize. Certainly "{...}" or "turtleText" are. >>> >> Those would be in scope for RDF-WG. >> > So, I think the way we're talking here, making SOLN-A and SOLN-B be very > close parallels, differing only in whether the contents are in-line or > out-of-band, ... I think that means the Graph identifier part of the > formats supporting nice in-line syntaxes (eg TriG) is really identifying > a g-box. So the other triples, referring to Errol's statement, don't > have to change when one switches between SOLN-A and SOLN-B. > > Under normal operation, it would be equivalent to say: > > 1. In Turtle: > > <http://example.org/foo> <p> <o>. > > while at http://example.org/foo is the Turtle: > > <a> <b> <c>. > > or > > 2. In something like TriG: > > <http://example.org/foo> <p> <o>. > <http://example.org/foo> {<a> <b> <c>. } > > > I'm thinking that's a simple and workable approach. I'm not sure if > that's what you were proposing or not. > > This means Charlie can't exactly say, "I agree with Error's RDF graph > (g-snap) G1" because he can't make an identifier for the G-snap G1; > instead he says "I agree with Error's RDF graph which I have copied to > this g-box,<foo>". > > I think I like that design -- never having g-snaps identified directly > -- so people have less to get confused about. It's like a programming > language that always passes by value, never by reference, so there's > less confusion. We can't get rid of g-boxes -- those are files with > RDF in them -- so let's get rid of (direct) g-snaps. > > I guess it's also like how people don't generally make up names for > numbers. They either serialize the number, or give a name to a slot > that holds a number that might be edited (eg "the world population in > 2000"). It wonder if there is a parallel to Pi or e -- a few particular > RDF graphs to which it would be good to give standard identifiers. > > So, I guess I'm with you on not having a mechanism for directly > attaching URIs to g-snaps. People can attach them to g-boxes, and if > they are confident it wont change, they can just think of it as a > g-snap. Hmmmm. > > >>> Maybe "cameFrom" >>> and "hashWhenFetched". Probably not "agreement", at least not in this >>> fuzzy form. >>> >> None of those are in scope for RDF-WG. They are in scope for PROV-WG. >> > Sounds reasonable to me. > > >> (Another point regarding your use case: Errol shouldn't have fixed his mistake in place, but deleted the old assertions and published a corrected account under a new address. The latter should be considered best practice in situations of this kind. We can't really expect Charlie to do extra work to ensure that Errol can fix the mistake in place – the incentives are not right. His motive is probably only to prevent Alice from making a poor decision based on Errol's disinformation, not to protect Errol's reputation.) >> > Excellent point. But surely there are RDF documents on the web that > are going to be changing in place, like people's foaf files.... How > would you allow that? > > One approach is like W3C TRs -- there's a "latest URI", where the > contents changes, and a new "snapshot URI" every time the contents > change. (And old snapshots can be deleted to save space whenever you > want.) I think this is a good practice, but can we really ask everyone > with a foaf file to follow it...? Maybe.... Yeah..... > > I've never implemented it, but I've often thought about making snapshot > URIs include a secure hash of the contents. So Errol would publish his > first statement at: > > http://errol.example.org/check-sha/13ae3ec8f7c3b8f814ab8f1da9510ebdc0f8c740f1763f825429e9e8c3c21878 > > and Charlie would copy it over to > > http://charlie.example.org/check-sha/13ae3ec8f7c3b8f814ab8f1da9510ebdc0f8c740f1763f825429e9e8c3c21878 > > Here I'm suggesting "check-sha" would signal to receivers that they > SHOULD confirm the contents. That means they wouldn't have to trust > Errol or Charlie not to maliciously or accidentally change things. It > would essentially force people to follow the practice of making a new > URI every time they want to change the contents. > > This would not allow content negotiation on snapshots, although it could > still be used on the "latest version" so maybe that's okay. Con-neg on > the latest version could pass along the snapshot URI for that > content-type. > > It's also doing unauthorized URI inspection; I suppose we could fix that > by making it be .well-known/check-sha. I bet we'd get into an > interesting conversation with some IETF folks over that. :-) There > may be a way to integrate this with Memento; I don't remember how it > works, exactly. > > /me goes back and rereads http://www.w3.org/2003/08/introhash/v2 which > is a little dated but still cool. :-) Something like that might be > good for folks who want a secure latest-version URI, but it's probably > too complicated for the current deployment environment. > > -- Sandro > > > >
Received on Friday, 30 September 2011 12:30:07 UTC