- From: Richard Newman <rnewman@franz.com>
- Date: Sun, 27 May 2007 23:54:58 -0700
- To: Bob MacGregor <bmacgregor@siderean.com>
- Cc: public-rdf-dawg-comments@w3.org, Eric Prud'hommeaux <eric@w3.org>
Bob, DAWG, folks,
I'm going to weigh in here, because I have implementation experience,
and I'm practically mentioned in a FROM NAMED clause. My apologies in
advance to the DAWG for stepping on toes.
On 24 May 2007, at 7:29 AM, Bob MacGregor wrote:
> At one point in SPARQL's evolution, the language introduced a
> SOURCE operator that allowed for a
> context argument that could be either a variable or a constant.
> The SOURCE construct effectively
> treats contexts as first-class entities. The currently-adopted
> named graphs notion treats contexts
> as second-class objects. The SOURCE operator is consistent with a
> fully-functional quad
> implementation; the named graph notion is much more limited. The
> principal advantage of the
> named graph notion is that it is only a small extension beyond the
> traditional RDF spec.
In what way is GRAPH limited? It's merely a syntactic extension of
Turtle to allow a fourth field to be specified:
GRAPH ?foo {
?x ?y ?z .
GRAPH x:y {
?a ?b ?c .
}
}
is fine. (Indeed, in AllegroGraph we expand that into quads internally:
?foo ?x ?y ?z .
x:y ?a ?b ?c .)
If your implementation allows you to use an unrestricted dataset
(i.e., you don't have to enumerate your graphs/sources using FROM
NAMED), I can't even see a problem there... and the dataset issue
applies equally to SOURCE.
SOURCE heavily restricts a SPARQL implementation, forcing it to track
provenance (whither programmatically generated triples?), or fail
queries that try to use SOURCE. GRAPH provides instead a generic
fourth field; the particular endpoint can choose what that field is
used for.
I'd choose flexibility over specificity. GRAPH > SOURCE.
> However, major commercial vendors are implementing full support for
> quads. Franz's AllegroGraph has
> a quad implementation (actually, they mentioned quints, but the
> fifth argument is internal),
> Kowari/Tucana implements full quads, and Siderean's Seamark Navigator
> (my own company) has full quads. The reason for this is that full
> quads enable performant implementations of
> provenance information and named graphs do not.
I should point out that, in AllegroGraph, the fourth field of the
quad is used to implement named graphs (though it can be used for
other things, too), and the AllegroGraph SPARQL interface uses GRAPH
to query the fourth field: quad-fourth-fields and named graphs *are
the same thing*.
If you want to use the graph field to track provenance, you can: when
you're querying through SPARQL on AllegroGraph, and tracking
provenance in the graph argument, GRAPH acts exactly like SOURCE --
but you can use it for other things, too, if you'd prefer to use it
for access control, or geocoding, or inference.
I have personally implemented a system to do full access control and
provenance using the named graph support in AllegroGraph. I don't see
any way in which "full quads" are different to having a graph slot in
a 'triple': both of them give an additional field in which to store
information. All "named graphs" is is a suggestion about how you
might want to use the fourth field: to cluster triples together
"under" some URI. SOURCE, on the other hand, is a *requirement* that
an implementation track provenance in a fourth (or fifth) field.
I suspect that you are blinkered by one possible approach to named
graphs: having a separate model per graph, with performance penalties
when crossing between models, or using many models. One could just as
easily build an RDF store that has a separate model for each
property: that doesn't mean that the design of SPARQL is wrong, only
that that particular implementation does not adequately support the
use case you are envisioning.
> What we have here is a case were the serious commercial vendors,
> who care about performance,
> have chosen a direction different than the one adopted by
> SPARQL. My suggestion of to resurrect
> the SOURCE construct in SPARQL.
We added flexible named graphs in AllegroGraph 2.0 because customers
wanted them. AllegroGraph's design made it easy to do so, and the
graph field is fully indexed, just like s/p/o. Some customers want to
use the graph field for other purposes, and we facilitate that, but
"graph" is a good default interpretation of the fourth field of a
triple.
Can you give a use case or two that SOURCE allows, but GRAPH does
not? I believe that that is a motivating factor for the WG. I'd also
love to hear ways in which AllegroGraph -- one of your mentioned
"serious commercial" products -- is moving away from the conceptual
direction of SPARQL, because I put a fair amount of effort into
ensuring that it does not.
> In choosing named graphs, it has chosen
> an impoverished solution that satisfies only one aspect of
> provenance, while major vendors are
> taking a more enlightened approach, full quads, that supports all
> manner of provenance information.
> In the long run, performance always wins out; quads are going to
> make named graphs a footnote.
Unless I'm misunderstanding you, I think you're arguing across
yourself. Named graphs are not necessarily different to quads: in
AllegroGraph, for instance, they are exactly the same. Think of named
graphs as merely a suggested application of quads, and your objection
goes away.
I still fail to see how SOURCE is more "enlightened" or performant
than GRAPH. I look forward to your explanation.
Regards,
-Richard
Received on Monday, 28 May 2007 06:55:19 UTC