Graph/data model specification from Steve Harris on 2005-03-23 (public-rdf-dawg@w3.org from January to March 2005)

From: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Date: Wed, 23 Mar 2005 10:28:34 +0000
To: DAWG public list <public-rdf-dawg@w3.org>
Message-ID: <20050323102834.GC16051@login.ecs.soton.ac.uk>

Appologies for the length of this mail, but its a complicated subject and
I want to be as clear as possible.

I have been very uncomfortable with the way that the graph data model is
specifed in the SPARQL working draft. I've been mulling over it and its
implications for some time, and it still seems like the wrong thing. As
Andy has shown, it is possible for (most?) existing system to emulate
the proposed model, but the emulation doesn't sit right with me, and it is
inconvienient for users in what is the current most-common behaviour in my
experience. So, I have a slightly different suggestion that I hope is a
bit more implementation neutral.

Implementation should be little/no effort for background+named graph
systems (it requires the ability to identify the background graph with a
URI, but I understand cwm can allready do this), and it is less effort to
support and less of a departure for quad-based systems than the current
proposal.

Two query (protocol/whatever) parameters, I'l call them "use" and
"constrain", though those are not good names.

They both take lists of URIs, and systems can indicate thier defaults, eg.
via the SADDLE scheme. SPARQL does not specify what the defaults should
be.

"Use"

Use is a list of URIs that are to used as GRAPH URIs to match graph
patterns in which the GRAPH keyword is not used. Systems using background
graphs can set thier default use value to be some URI representing the
background graph, aggregator-type systems can default it to some value
indicating that all known graphs (modulo constrinats in "constrain") are to
be used (I'm not sure about how to invoke this feature, using a legal URI
to indicate 'all' is a bit fraught, * seems ugly, and using the empty list
is wrong).

"Constrain"

This is a hard limit on the graphs that may be used to answer queries, it
could be equivalent to adding
	GRAPH ?gN {...} FILTER ?gN = <uri1> || ?gN = <uri2> ...
to every triple in an aggregator, for dynamic loading systems its just
the list of graphs to load.

All graphs are identified by some URI, even those specifed in "use", though
it may be urn:x-local:background or something equally vague.

Pros

Hopefully this is more neutral to the implementation of the rdf stores
engine. It allows background-like behaviour, but it doesnt limit it to a
particular graph in a given instance of a store/query pair, and it doesnt
require the graph to be stored twice (even conceptually) if you want both
provenance and answers without additional constraints.

Cons

It still has the behaviour that
	SELECT ?x ?y ?z WHERE {?x ?y ?z .}
is different from
	SELECT ?x ?y ?z WHERE GRAPH ?g {?x ?y ?z .}
which makes me uneasy, but I can live with it.

- Steve

Received on Wednesday, 23 March 2005 10:28:37 UTC