- From: Sandro Hawke <sandro@w3.org>
- Date: Mon, 24 Mar 2014 23:17:32 +0300
- To: Ted Thibodeau Jr <tthibodeau@openlinksw.com>, public-ldp-wg@w3.org
On 03/24/2014 07:44 PM, Ted Thibodeau Jr wrote: > Sandro wrote: >> PROPOSED: Change LDP paging to no longer be characterized as >> "lossy". Specifically, a client who pages all the way through >> (forward or backward) MUST be sent every triple in intersect >> (g0, g1, ... gn) and MAY be sent any triples in the union (g0, >> g1, .. gn), where Gi is the paged graph at the time client >> makes request i. Servers MAY refuse to serve any page because >> they don't want to implement this kind of paging in certain >> cases; if they do so, they MUST return a 410 GONE. > > I think Sandro's suggestion remains “lossy", in that insertions > *may* be made visible, but any deletions are to be concealed. > This bothers me. My intention, which I'm fairly confident is reflected in the proposed formulation, is that both insertions and deletions MAY be visible to the client. If a triple has been deleted before Gi, then it will be absent from the intersection, and therefore the server is not mandated to send it to the client. But it's in the union, so the server MAY send it to the client. So deletions, like insertions, MAY be seen by the client. > > As I said in the con-call, I think the current paging thoughts > are a conflation of two concepts from the (R)DBMS world - > “(scrollable) cursors” and “(transaction) isolation”. I prefer to think of it as an elegant combination of two concepts. :-) [mostly joking, read on....] > I strongly recommend some investigation of these concepts. > These wikipedia pages are imperfect but provide a good, if > basic, starting point. > > <https://en.wikipedia.org/wiki/Isolation_%28database_systems%29> > > <https://en.wikipedia.org/wiki/Cursor_%28databases%29> > > I recommend particular attention to the “Isolation Levels vs > Read Phenomena” table on the Isolation page. > > There are tradeoffs to each and every option — and whether any > given tradeoff is more or less important depends on the use case — > and this is why the DBMS world mostly sees every engine implement > all cursors and isolation levels, and mostly allows clients to > specify which ones they want/need to use. > > I would suggest that we consider writing this version of the > LDP spec to only require implementation of the lowest-level, > least-effort options in each. > > For the cursor, this is the “FORWARD ONLY” a/k/a “non-scrollable” > cursor. The scroll only goes from one end to the other, without > any reversing midway. When working with an unORDERed result set, > you can only go in the arbitrary “forward” direction. When > working with an ORDERed result set, it is really “unidirectionally > scrollable” because it can be walked in either direction by simply > inverting the ORDER BY. That's exactly what I'm proposing. The proposal only mandates what happens if the client does a complete traversal. For any other use of first/last/prev/next pointers, the spec would not define what the client gets. I didn't spell out "pages all the way though" in this brief proposal, but I mean the obvious thing: (1) follow the first [last] link from the paged resource to get the first sub-page; repeatedly follow the next [prev] link from that sub-page to the adjacent sub-page; stop when there is no next [prev] link, as that means you've reached the end. If the server is only allowing forward traversal, then there is no rel=last link, so reverse traversal is impossible. > For isolation, this is “READ UNCOMMITTED”, which basically means > that the client can’t rely on anything being the same *now* as it > was *then*, no matter how short the window between. The client > gets a snapshot-in-time for each request, but has no way to tell > the server “act on *that* snapshot-in-time” nor how to act based > on *other* client requests received in the interim (or received > after mine, but based on the same state as my request was...) SPARQL and LDP don't use the terminology of "transactions" and don't have transactions as they are normally understood. To me, given my years of database programming in the 80s and 90s, a transaction is relatively long lived, often consisting of many operations, or one slow operations. In particular, a transaction includes read operations. What we have with HTTP and LDP is simple, short "modifications", which do not contain any reads. Since modifications do not contain reads, the distinctions between, eg, "repeatable reads" and "read committed" are not observable,... at least as far a I can tell. But they are Isolated. If we didn't have Isolation in LDP, you could be doing a PUT while someone else does a GET and they'd see only a few of the triples you PUT, but still get a 200 OK. I think that's forbidden by the HTTP spec, and therefore not allowed in LDP. I don't think paging can or should change that. Any POST, PATCH, or PUT should appear to clients to be instantaneous, whatever GETS are happening at approximately the same time, with or without paging. (This is what Isolation means.) > I think this leaves a clear path for future evolution (adding > the other cursors and/or isolation levels), while making the > current spec easier to write and early implementations easier > to produce. I don't see a spectrum of useful options here. I just see (1) lossy paging (where the client is only guaranteed to see some subset of intersection(G0...Gn), (2) lossless paging (my proposal), and (3) static paging (where after a complete traversal, the client has received G0). Static paging isn't in scope right now, but it could be easily handled by having resource that offer it include Link rel=snapshot. Then you do normal lossless paging through the snapshot resource, knowing that G0...Gn for the snapshot are all the same as G0 of the original resource. > I hope this is more helpful than painful. I agree it's important to be clear about all this. I'm also aware that I have only a programmer's expertise in these matters, not any formal theory background. (Ironically, I had lunch with Michael Stonebraker yesterday, but all we did was argue about whether RDF was useful for real world data integration. He doesn't like the occasional claims that RDF is a panacea for data integration, but I'm not making those claims.) Unfortunately, it's hard to explain what we're doing in LDP to a database-theory person. Maybe one of us has such a person handy who wants to write a paper analyzing LDP and proving various properties? There are many in my building, but I don't know that I could convince any of them to do this. -- Sandro > Ted > > > > > -- > A: Yes. http://www.guckes.net/faq/attribution.html > | Q: Are you sure? > | | A: Because it reverses the logical flow of conversation. > | | | Q: Why is top posting frowned upon? > > Ted Thibodeau, Jr. // voice +1-781-273-0900 x32 > Senior Support & Evangelism // mailto:tthibodeau@openlinksw.com > // http://twitter.com/TallTed > OpenLink Software, Inc. // http://www.openlinksw.com/ > 10 Burlington Mall Road, Suite 265, Burlington MA 01803 > Weblog -- http://www.openlinksw.com/blogs/ > LinkedIn -- http://www.linkedin.com/company/openlink-software/ > Twitter -- http://twitter.com/OpenLink > Google+ -- http://plus.google.com/100570109519069333827/ > Facebook -- http://www.facebook.com/OpenLinkSoftware > Universal Data Access, Integration, and Management Technology Providers > > > > > > > > >
Received on Monday, 24 March 2014 20:17:37 UTC