Definition of Basic Graph Pattern Matching & Pattern Instance Mapping from Birte Glimm on 2009-09-24 (public-rdf-dawg@w3.org from July to September 2009)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Thu, 24 Sep 2009 15:00:11 +0100
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <492f2b0b0909240700i69b8b32fqe967b125d8ff8884@mail.gmail.com>

Hi all,
I am reading the definitions of a pattern instance mapping and basic
graph pattern matching again and again and I am not quite happy with
them. Maybe I am too much of a theoretician, but I find them too
imprecise and given that they are very important I would like them to
be very clear and I am inclined to suggest some changes/clarifications
for those definitions in the next SPARQL spec, but please correct me
if I am just overlooking somehing.

First, a pattern instance mapping P is defined as a composition of an
RDF instance mapping sigma with a solution mapping mu, i.e.,
P(x)=mu(sigma(x)), see 12.3.1. The domain of an RDF instance mapping
is, however, the set of blank nodes and its range is the set of
literals, blank nodes, and IRIs. Now the domain of mu is the set of
variables, which means we cannot really compose the two. The range of
the first is disjoint from the domain of the second. What I would
rather say is that P is a pair (mu, sigma).
Next, in the definition of basic graph pattern matching, we suddenly
use P(BGP) although the domain of P is, well it is unclear but most
likely, variables and blank nodes and not basic graph patterns. It is
intuitively clear what is meant, but it is an abuse of notation. To
clarify this, I would prefer to write something like:
Let P=(mu, sigma) be a pattern instance mapping, BGP a basic graph
pattern, and G an RDF graph. We write P(BGP) to denote the result of
replacing each variable ?x in BGP that is in dom(mu) with mu(?x) and
each blank node b that is in dom(sigma) with sigma(b).
As I understand it, mu and sigma might be undefined for some variables
and blank nodes in BGP because they are just partial functions, so I
should only replace variables/bnodes for which the function is
defined. We can then go on to define solutions:
Let V(BGP) denote the set of variables occurring in BGP. A solution
mapping mu is a solution for BGP from G if dom(mu)=V(BGP) and there is
a pattern instance mapping P=(mu, sigma) such that P(BGP) is *an
instance of* a subgraph of G.
I think that t is important to use *an instance* of a subgraph here
because as is said later, we use a scoping graph SG (at least in
theory) instead of the active graph and instantiations of BGP have to
be subgraphs of SG and not necessarily G. Since SG is an instance of
G, of course P(BGP) is an instance of a subgraph of G.

Any thoughts on that?

Apart from that I have a few more minor editorial errors for the
SPARQL 1.0 spec...
This is really minor, but there is a comma missing in the second
paragraph of 12.6:
... a triple ( s, p, o ) is in the first pattern if and only if the
triple ( M(s), M(p)*,* M(o) ) is in the second.
Further:
12.6, Notes (c) These conditions do not impose the SPARQL requirement
that SG share*s* no blank nodes with AG or BGP.

Cheers,
Birte



-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529

Received on Thursday, 24 September 2009 14:00:51 UTC