The following replaces 5/5.1/5.2 :

----------------

5 Basic Graph Patterns

A basic graph pattern is a set of triple patterns.
Basic graph patterns stand in the same relation to triple patterns
that RDF graphs do to RDF triples, and much of the same terminology
can be applied to them. In particular, two basic graph patterns are
said to be *equivalent* if there is a bijection M between the
terms of the triple patterns that maps blank nodes to blank nodes and
maps variables, literals and IRIs to themselves, such that a triple (
s, p, o ) is in the first pattern if and only if the triple ( M(s),
M(p) M(o) ) is in the second. This definition extends that for RDF
graph equivalence to basic graph patterns by preserving variable
names across equivalent patterns.

Basic graph patterns can be instantiated by
replacing both variables and blank nodes by terms, giving two notions
of instance. Following the terminology of [RDF semantics], an *RDF
instance mapping* is a mapping from blank nodes to RDF terms. An
*answer mapping* is a mapping from query variables to RDF terms.
An *instance mapping* is a mapping from blank nodes and
variables to RDF terms. Any instance mapping defines an answer
mapping and an instance mapping obtained by restricting it to query
variables and blank nodes respectively.

5.1 SPARQL Basic Graph Pattern Matching

Suppose BGP is a basic graph pattern and G is an
RDF graph. A *pattern solution* for BGP *from* G is a
pattern instance mapping P such that P(BGP) has an RDF instance which
is a subgraph of G. Equivalently, such that P(BGP) is simply entailed
by G.

This definition allows the solution mapping to
bind a variable in BGP to a blank node in G. Since SPARQL treats
blank node IDs in the answer document as scoped to the document, they
cannot be understood as identifying nodes in the dataset. If DS is
the dataset of a query, pattern solutions are therefore understood to
be not from DS itself, but from an RDF graph, called the *scoping
graph,* which is graph-equivalent to DS but shares no blank nodes
with DS or with BGP. The same scoping graph is used for all answers
to a single query. The scoping graph is purely a theoretical
construct; in practice, the effect is obtained simply by the document
scope conventions for blank node identifiers.

Since RDF blank nodes allow infinitely many redundant solutions for many patterns (obtained by adding new blank nodes), it is necessary to somehow delimit the set of answers. SPARQL adopts a simple subgraph matching criterion for this. A SPARQL answer is the restriction of an instance mapping M to query variables, where M(BGP) is a subset of the scoping graph. There is one answer for each distinct such instance mapping M.

This is optimised for ease of computation rather than redundancy elimination. It allows answer sets to contain redundancies even when the dataset is lean, and it allows logically equivalent datasets to yield distinct answer sets.

<<might give a couple of simple examples?>>

Appendix @@@ gives general conditions on query answer sets for other entailment regimes.

-----------------

This is the Appendix:

-----------------

General Framework for Basic Graph Matching

The overall SPARQL design can be used for queries which assume a more elaborate form of entailment than simple entailment, by re-writing the matching conditions for basic graph patterns. Since it is an open research problem to state such conditions in a single general form which applies to all forms of entailment and optimally eliminates needless of inappropriate redundancy, this document only gives necessary conditions which any such solution should satisfy. These will need to be extended to full definitions for each particular case.

An *entailment regime* is a transitive
idempotent binary relation between subsets of RDF graphs. A graph in
the range of an entailment regime E is called *well-formed* for
the regime.

Examples of entailment regimes include simple entailment [RDF-MT], RDF entailment [RDF-MT], RDFS entailment [RDF-MT], D-entailement [RDF-MT] and OWL entailment [OWL-Semantics]. If E is an entailment regime then we will refer to E-entailment, E-consistency, etc.., following this naming convention.

Some entailment regimes can categorize some RDF graphs as inconsistent. For example, "-1"^^xsd:positiveInteger is inconsistent with respect to D-entailment. The effect of a query on an inconsistent graph is not covered by this specification.

A SPARQL extension to E-entailment **must**
satisfy the following conditions.

(1) The scoping graph SG corresponding to any consistent source document SD is uniquely specified and is E-equivalent to SD.

For any basic graph pattern BGP and pattern solution P

(2) P(BGP) is well-formed for E

For any scoping graph SG and answer set
{P_{1} ... P_{n}} for a basic graph pattern BGP

(3) SG E-entails (SG union
P_{1}(BGP_{1}) union ... union
P_{n}(BGP_{n}))

where {BGP_{1 .... }BGP_{n}}is a
set of basic graph patterns all equivalent to BGP none of which share
any blank nodes with any other or with SG.

These conditions do not fully determine the set of
possible answers, since RDF allows unlimited amounts of redundancy.
In addition, therefore, the following **must** hold.

(4) Each SPARQL extension must provide conditions on answer sets which guarantee that every BGP and SD has a unique, finite set of answers.

Notes.

(a) SG will often be graph equivalent to SD, but restricting this to E-equivalence allows some forms of normalization, for example elimination of semantic redundancies, to be applied to the source documents before querying.

(b) The construction in condition 3 ensures that
any blank nodes in the answer bindings are used in a way which is
internally consistent with the way that blank nodes occur in SG. This
ensures that blank node identifiers occur in more than one answer in
an answer set only when the blank nodes so identified are indeed
identical in SG. If the extension does not allow answer bindings to
blank nodes, then this condition can be simplified to the condition

SG E-entails P(BGP) for each pattern solution P.

(c) These conditions do not impose the SPARQL requirement that SG share no bnodes with SD or BGP. In particular, it allows SG to actually be SD. This allows query protocols in which blank node identifiers retain their meaning between the query and the source document, or across multiple queries. Such protocols are not supported by the current SPARQL protocol specification, however.

(d) Since conditions 1 to 3 are only necessary conditions on answers, condition 4 allows cases where the set of legal answers can be restricted in various ways. For example, the current state of the art in OWL-DL querying focusses on the case where answer bindings to blank nodes are prohibited. We note that these conditions even allow the pathological 'mute' case where every query has an empty answer set.

(e) None of these conditions refer explicitly to instance mappings on blank nodes in BGP. For some entailment regimes, the existential interpretation of blank nodes cannot be fully captured by the existence of a single instance mapping. These conditions allow such regimes to give blank nodes in query patterns a 'fully existential' reading.

It is straightforward to show that SPARQL satisfies these conditions for the case where E is simple entailment, given that the SPARQL condition on SG is that it is graph-equivalent to SD but shares no bnodes with SD or BGP (which satisfies the first condition). The only condition which is nontrivial is (3).

Every answer P_{i} is the answer mapping
restriction of an instance M_{i} such that
M_{i}(BGP_{i}) is a subgraph of SG. Since
BGP_{i} and SG have no blank nodes in common, the range of
M_{i} contains no blank nodes from BGP_{i};
therefore, the answer mapping P_{i} and RDF instance mapping
I_{i} components of M_{i} commute, so
M_{i}(BGP_{i}) =
I_{i}(P_{i}(BGP_{i})). So

M_{1}(BGP_{1}) union ... union
M_{n}(BGP_{n})

=
I_{1}(P_{1}(BGP_{1})) union ... union
I_{n}(P_{n}(BGP_{n}))

= [ I_{1} + ... + I_{n}](
P_{1}(BGP_{1}) union ... union
P_{n}(BGP_{n}) )

since the domains of the I_{i} instance
mappings are all mutually exclusive. Since they are also exclusive
from SG,

SG union [ I_{1} + ... + I_{n}](
P_{1}(BGP_{1}) union ... union
P_{n}(BGP_{n}) )

= [ I_{1} + ... + I_{n}](SG
union P_{1}(BGP_{1}) union ... union
P_{n}(BGP_{n}) )

i.e.

SG union P_{1}(BGP_{1}) union ...
union P_{n}(BGP_{n})

has an instance which is a subgraph of SG, so is simply entailed by SG by the RDF interpolation lemma [RDF Semantics: http://www.w3.org/TR/rdf-mt/#interplemmaprf].