- From: Bob MacGregor <bmacgregor@siderean.com>
- Date: Mon, 28 May 2007 13:39:35 -0700
- To: Richard Newman <rnewman@franz.com>
- Cc: public-rdf-dawg-comments@w3.org, Eric Prud'hommeaux <eric@w3.org>
- Message-Id: <2012C221-748F-4F43-BA56-7D08F60B8DF1@siderean.com>
Hi Richard, Point 1: I admit to a small mistake. What I liked about the erstwhile SOURCE construct was that it allowed the fourth (context) argument to be a variable. The supposition was that the value of that variable would be an indicator of the "source" of the matching statements, but there was no machinery that would have prevented us from attaching arbitrary provenance to the resources bound to that variable. So, our intent was to use SOURCE as a springboard for full four-valued statements. My mistake was in not mentioning that we were championing the syntax, rather than the rather restricted usage that was thought to be associated with it. Point 2: Any system that assumes that contexts have to have names is ultimately non-scalable. We have empirical data on this (which I have mentioned in prior conversations with Franz' Jans Aasman), but I'm not going to dwell on it here. The basic principle is that the semantics lies with the provenance attached to the contexts, not in the names of the contexts. That means that (1) the quad store must admit blank nodes as contexts as well as URIs (I assume that AllegroGraph is fine with that), and (2) the query language must allow contexts to be variables (which is where I believe SPARQL falls on its face). Point 3: All examples I've seen in SPARQL show finite enumerations of named graphs in FROM NAMED clauses. Some of our applications work with tens of thousands of different contexts (and we are just warming up). If the GRAPH construct remedies that, then that would be good news. However, there is no example in http://www.w3.org/TR/rdf-sparql-query or anywhere else that I've happened upon that illustrates the use of the context argument as a variable, bound to provenance restrictions (e.g., using a dc:source or dc:date property). If you can show me an example, especially one that I can run through SPARQLer, I would appreciate it. Cheers, Bob On May 27, 2007, at 2354, Richard Newman wrote: > Bob, DAWG, folks, > > I'm going to weigh in here, because I have implementation > experience, and I'm practically mentioned in a FROM NAMED clause. > My apologies in advance to the DAWG for stepping on toes. > > On 24 May 2007, at 7:29 AM, Bob MacGregor wrote: > >> At one point in SPARQL's evolution, the language introduced a >> SOURCE operator that allowed for a >> context argument that could be either a variable or a constant. >> The SOURCE construct effectively >> treats contexts as first-class entities. The currently-adopted >> named graphs notion treats contexts >> as second-class objects. The SOURCE operator is consistent with a >> fully-functional quad >> implementation; the named graph notion is much more limited. The >> principal advantage of the >> named graph notion is that it is only a small extension beyond the >> traditional RDF spec. > > In what way is GRAPH limited? It's merely a syntactic extension of > Turtle to allow a fourth field to be specified: > > GRAPH ?foo { > ?x ?y ?z . > GRAPH x:y { > ?a ?b ?c . > } > } > > is fine. (Indeed, in AllegroGraph we expand that into quads > internally: > > ?foo ?x ?y ?z . > x:y ?a ?b ?c .) > > If your implementation allows you to use an unrestricted dataset > (i.e., you don't have to enumerate your graphs/sources using FROM > NAMED), I can't even see a problem there... and the dataset issue > applies equally to SOURCE. > > SOURCE heavily restricts a SPARQL implementation, forcing it to > track provenance (whither programmatically generated triples?), or > fail queries that try to use SOURCE. GRAPH provides instead a > generic fourth field; the particular endpoint can choose what that > field is used for. > > I'd choose flexibility over specificity. GRAPH > SOURCE. > >> However, major commercial vendors are implementing full support >> for quads. Franz's AllegroGraph has >> a quad implementation (actually, they mentioned quints, but the >> fifth argument is internal), >> Kowari/Tucana implements full quads, and Siderean's Seamark Navigator >> (my own company) has full quads. The reason for this is that full >> quads enable performant implementations of >> provenance information and named graphs do not. > > I should point out that, in AllegroGraph, the fourth field of the > quad is used to implement named graphs (though it can be used for > other things, too), and the AllegroGraph SPARQL interface uses > GRAPH to query the fourth field: quad-fourth-fields and named > graphs *are the same thing*. > > If you want to use the graph field to track provenance, you can: > when you're querying through SPARQL on AllegroGraph, and tracking > provenance in the graph argument, GRAPH acts exactly like SOURCE -- > but you can use it for other things, too, if you'd prefer to use it > for access control, or geocoding, or inference. > > I have personally implemented a system to do full access control > and provenance using the named graph support in AllegroGraph. I > don't see any way in which "full quads" are different to having a > graph slot in a 'triple': both of them give an additional field in > which to store information. All "named graphs" is is a suggestion > about how you might want to use the fourth field: to cluster > triples together "under" some URI. SOURCE, on the other hand, is a > *requirement* that an implementation track provenance in a fourth > (or fifth) field. > > I suspect that you are blinkered by one possible approach to named > graphs: having a separate model per graph, with performance > penalties when crossing between models, or using many models. One > could just as easily build an RDF store that has a separate model > for each property: that doesn't mean that the design of SPARQL is > wrong, only that that particular implementation does not adequately > support the use case you are envisioning. > >> What we have here is a case were the serious commercial vendors, >> who care about performance, >> have chosen a direction different than the one adopted by >> SPARQL. My suggestion of to resurrect >> the SOURCE construct in SPARQL. > > We added flexible named graphs in AllegroGraph 2.0 because > customers wanted them. AllegroGraph's design made it easy to do so, > and the graph field is fully indexed, just like s/p/o. Some > customers want to use the graph field for other purposes, and we > facilitate that, but "graph" is a good default interpretation of > the fourth field of a triple. > > Can you give a use case or two that SOURCE allows, but GRAPH does > not? I believe that that is a motivating factor for the WG. I'd > also love to hear ways in which AllegroGraph -- one of your > mentioned "serious commercial" products -- is moving away from the > conceptual direction of SPARQL, because I put a fair amount of > effort into ensuring that it does not. > >> In choosing named graphs, it has chosen >> an impoverished solution that satisfies only one aspect of >> provenance, while major vendors are >> taking a more enlightened approach, full quads, that supports all >> manner of provenance information. >> In the long run, performance always wins out; quads are going to >> make named graphs a footnote. > > Unless I'm misunderstanding you, I think you're arguing across > yourself. Named graphs are not necessarily different to quads: in > AllegroGraph, for instance, they are exactly the same. Think of > named graphs as merely a suggested application of quads, and your > objection goes away. > > I still fail to see how SOURCE is more "enlightened" or performant > than GRAPH. I look forward to your explanation. > > Regards, > > -Richard > > Bob MacGregor Chief Scientist Siderean Software, Inc. 310.647.5690 bmacgregor@siderean.com
Received on Monday, 28 May 2007 20:39:45 UTC