- From: Richard Newman <rnewman@franz.com>
- Date: Sun, 27 May 2007 23:54:58 -0700
- To: Bob MacGregor <bmacgregor@siderean.com>
- Cc: public-rdf-dawg-comments@w3.org, Eric Prud'hommeaux <eric@w3.org>
Bob, DAWG, folks, I'm going to weigh in here, because I have implementation experience, and I'm practically mentioned in a FROM NAMED clause. My apologies in advance to the DAWG for stepping on toes. On 24 May 2007, at 7:29 AM, Bob MacGregor wrote: > At one point in SPARQL's evolution, the language introduced a > SOURCE operator that allowed for a > context argument that could be either a variable or a constant. > The SOURCE construct effectively > treats contexts as first-class entities. The currently-adopted > named graphs notion treats contexts > as second-class objects. The SOURCE operator is consistent with a > fully-functional quad > implementation; the named graph notion is much more limited. The > principal advantage of the > named graph notion is that it is only a small extension beyond the > traditional RDF spec. In what way is GRAPH limited? It's merely a syntactic extension of Turtle to allow a fourth field to be specified: GRAPH ?foo { ?x ?y ?z . GRAPH x:y { ?a ?b ?c . } } is fine. (Indeed, in AllegroGraph we expand that into quads internally: ?foo ?x ?y ?z . x:y ?a ?b ?c .) If your implementation allows you to use an unrestricted dataset (i.e., you don't have to enumerate your graphs/sources using FROM NAMED), I can't even see a problem there... and the dataset issue applies equally to SOURCE. SOURCE heavily restricts a SPARQL implementation, forcing it to track provenance (whither programmatically generated triples?), or fail queries that try to use SOURCE. GRAPH provides instead a generic fourth field; the particular endpoint can choose what that field is used for. I'd choose flexibility over specificity. GRAPH > SOURCE. > However, major commercial vendors are implementing full support for > quads. Franz's AllegroGraph has > a quad implementation (actually, they mentioned quints, but the > fifth argument is internal), > Kowari/Tucana implements full quads, and Siderean's Seamark Navigator > (my own company) has full quads. The reason for this is that full > quads enable performant implementations of > provenance information and named graphs do not. I should point out that, in AllegroGraph, the fourth field of the quad is used to implement named graphs (though it can be used for other things, too), and the AllegroGraph SPARQL interface uses GRAPH to query the fourth field: quad-fourth-fields and named graphs *are the same thing*. If you want to use the graph field to track provenance, you can: when you're querying through SPARQL on AllegroGraph, and tracking provenance in the graph argument, GRAPH acts exactly like SOURCE -- but you can use it for other things, too, if you'd prefer to use it for access control, or geocoding, or inference. I have personally implemented a system to do full access control and provenance using the named graph support in AllegroGraph. I don't see any way in which "full quads" are different to having a graph slot in a 'triple': both of them give an additional field in which to store information. All "named graphs" is is a suggestion about how you might want to use the fourth field: to cluster triples together "under" some URI. SOURCE, on the other hand, is a *requirement* that an implementation track provenance in a fourth (or fifth) field. I suspect that you are blinkered by one possible approach to named graphs: having a separate model per graph, with performance penalties when crossing between models, or using many models. One could just as easily build an RDF store that has a separate model for each property: that doesn't mean that the design of SPARQL is wrong, only that that particular implementation does not adequately support the use case you are envisioning. > What we have here is a case were the serious commercial vendors, > who care about performance, > have chosen a direction different than the one adopted by > SPARQL. My suggestion of to resurrect > the SOURCE construct in SPARQL. We added flexible named graphs in AllegroGraph 2.0 because customers wanted them. AllegroGraph's design made it easy to do so, and the graph field is fully indexed, just like s/p/o. Some customers want to use the graph field for other purposes, and we facilitate that, but "graph" is a good default interpretation of the fourth field of a triple. Can you give a use case or two that SOURCE allows, but GRAPH does not? I believe that that is a motivating factor for the WG. I'd also love to hear ways in which AllegroGraph -- one of your mentioned "serious commercial" products -- is moving away from the conceptual direction of SPARQL, because I put a fair amount of effort into ensuring that it does not. > In choosing named graphs, it has chosen > an impoverished solution that satisfies only one aspect of > provenance, while major vendors are > taking a more enlightened approach, full quads, that supports all > manner of provenance information. > In the long run, performance always wins out; quads are going to > make named graphs a footnote. Unless I'm misunderstanding you, I think you're arguing across yourself. Named graphs are not necessarily different to quads: in AllegroGraph, for instance, they are exactly the same. Think of named graphs as merely a suggested application of quads, and your objection goes away. I still fail to see how SOURCE is more "enlightened" or performant than GRAPH. I look forward to your explanation. Regards, -Richard
Received on Monday, 28 May 2007 06:55:19 UTC