Text for 18.6 Extending SPARQL Basic Graph Matching from Birte Glimm on 2011-02-15 (public-rdf-dawg@w3.org from January to March 2011)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Tue, 15 Feb 2011 14:19:00 +0000
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>
Message-ID: <AANLkTinkqN4=JRm71Oiq7FCGWA=P2Y2SU6ZWV5-rEhKn@mail.gmail.com>
Hi all,
Andy suggested that we take my concerns about the text in 18.6 to
another thread since this is already old SPARQL 1.0 text and not an
issue with newly introduced features.

I wrote:
"18.6 Extending SPARQL Basic Graph Matching
I am not quite happy with the text (in particular the formulation of
the conditions) since it is not at all well-aligned with the notation
used in the rest of the document, e.g., "answer set" is everywhere
else "solution sequence" and in this case answer set is even a set of
pattern instance mappings, which is not the case anywhere else, where
a BGP evaluates into a multiset of solution mappings and the RDF
instance mappings just determine the multiplicity.

We (Markus Krötzsch and I) discuss what is wrong with the conditions in
an ISWC paper and I am happy to suggest a more aligned version of the
conditions, if the WG is interested in this."

I propose, therefore, to change the text with the purpose of getting
the terminology aligned with the rest of the spec and motivate more
clearly the purpose of the conditions (see also Sec. 4 of the attached
paper for explanations for the proposed changes).

Current text:
A SPARQL extension to E-entailment must satisfy the following conditions:
1. The scoping graph SG, corresponding to any consistent active graph
AG, is uniquely specified and is E-equivalent to AG.
2. For any basic graph pattern BGP and pattern solution mapping P,
P(BGP) is well-formed for E.
3. For any scoping graph SG and answer set {P_1,...,P_n} for a basic
graph pattern BGP, and where BGP_1, ..., BGP_n is a set of basic graph
patterns all equivalent to BGP, none of which share any blank nodes
with any other or with SG
  SG E-entails (SG union P_1(BGP_1) union ... union P_n(BGP_n)).
4. Each SPARQL extension must provide conditions on answer sets which
guarantee that the set of triples obtained by instantiating BGP with
each solution μ is uniquely specified up to RDF graph equivalence, and
should provide further conditions to prevent trivial infinite answers
as appropriate to the regime.

Proposed text:
An entailment regime E must provide conditions on basic graph pattern
evaluation such that for any basic graph pattern BGP, any RDF graph G,
and any evaluation that satisfies the conditions, the resulting
multiset of solutions is uniquely determined up to RDF graph
equivalence. We denote the multiset of solutions from evaluating BGP
over G using E with Eval_E(G, BGP).
An entailment regime must further satisfy the following conditions:
1. For any consistent active graph AG, the entailment regime E
uniquely specifies a scoping graph SG that is E-equivalent to AG.
2. A set of well-formed graphs for E is specified such that, for any
basic graph pattern BGP, scoping graph SG, and solution mapping μ in
Eval_E(SG, BGP), the graph μ(BGP) is well-formed for E.
3. For any basic graph pattern BGP and scoping graph SG, if μ_1, ...,
μ_n in Eval_E(SG, BGP) and BGP_1, ..., BGP_n are basic graph patterns
all equivalent to BGP but not sharing any blank nodes with each other
or with SG, then
  SG E-entails SG union μ_1(BGP_1) union ... union μ_n(BGP_n)
4. Entailment regimes should provide conditions to prevent trivial
infinite solution multisets as appropriate to the regime.

The uniqueness requirement of condition 4 is moved above all other
conditions, since 2. and 3. don't make sense if the solution multiset
was not defined in this sense. I am happy to turn the text into XML if
we decide to replace the current text with the proposed one.

The notes (18.6.1 Notes) on BGP matching extensions also have some
inconsistent use of pattern instance mappings as solutions/answers:

(b) ... If the extension does not allow answer bindings to blank
nodes, then this condition can be simplified to the condition:
SG E-entails P(BGP) for each pattern solution P.
should be
(b) ... If the extension does not allow bindings to blank nodes, then
this condition can be simplified to the condition:
  SG E-entails μ(BGP) for each solution mapping μ.

(d) ...an empty answer set.
should be
(d) ...an empty solution sequence.

(e) ... Every answer P_i is the solution mapping restriction of a
SPARQL instance M_i such that M_i(BGP_i) is a subgraph of SG. Since
BGP_i and SG have no blank nodes in common, the range of M_i contains
no blank nodes from BGP_i; therefore, the solution mapping P_i and RDF
instance mapping I_i components of M_i commute, so M_i(BGP_i) =
I_i(P_i(BGP_i)). So
M_1(BGP_1) union ... union M_n(BGP_n)
= I_1(P_1(BGP_1)) union ... union I_n(P_n(BGP_n))
= [ I_1 + ... + I_n]( P_1(BGP_1) union ... union P_n(BGP_n) )
since the domains of the I_i instance mappings are all mutually
exclusive. Since they are also exclusive from SG,
SG union [ I_1 + ... + I_n]( P_1(BGP_1) union ... union P_n(BGP_n) )
= [ I_1 + ... + I_n](SG union P_1(BGP_1) union ... union P_n(BGP_n) )
i.e.
SG union P_1(BGP_1) union ... union P_n(BGP_n)
has an instance which is a subgraph of SG, so is simply entailed by SG
by the RDF interpolation lemma [RDF-MT].
should be

(e) ... For every solution mapping μ_i, there is, by definition of
basic graph pattern matching, an RDF instance mapping σ_i such that
P_i(BGP_i) is a subgraph of SG where P_i is the pattern instance
mapping composed of μ_i and σ_i. Since BGP_i and SG have no blank
nodes in common, the ranges of σ_i and μ_i contain no blank nodes from
BGP_i; therefore, the solution mapping μ_i and the RDF instance
mapping σ_i of P_i commute, so P_i(BGP_i) = σ_i(μ_i(BGP_i)). So
P_1(BGP_1) union ... union P_n(BGP_n)
= σ_1(μ_1(BGP_1)) union ... union σ_n(μ_n(BGP_n))
= [ σ_1 + ... + σ_n]( μ_1(BGP_1) union ... union μ_n(BGP_n) )
since the domains of the σ_i RDF instance mappings are all mutually
exclusive. Since they are also exclusive from SG,
SG union [ σ_1 + ... + σ_n]( μ_1(BGP_1) union ... union μ_n(BGP_n) )
= [ σ_1 + ... + σ_n](SG union μ_1(BGP_1) union ... union μ_n(BGP_n) )
i.e.
SG union μ_1(BGP_1) union ... union μ_n(BGP_n)
has an instance which is a subgraph of SG, so is simply entailed by SG
by the RDF interpolation lemma [RDF-MT].

Best regards,
Birte


-- 
Dr. Birte Glimm, Room 309
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283520
Attachments

application/pdf attachment: paper.pdf
Received on Tuesday, 15 February 2011 14:19:33 UTC