Re: Constructive mapping semantics for SPARQL from Fred Zemke on 2006-08-22 (public-rdf-dawg@w3.org from July to September 2006)

From: Fred Zemke <fred.zemke@oracle.com>
Date: Mon, 21 Aug 2006 17:24:34 -0700
To: andy.seaborne@hp.com
CC: public-rdf-dawg@w3.org
Message-ID: <44EA4EC2.4030107@oracle.com>
Continuing my reply to Andy message below:

Seaborne, Andy wrote:

>
>
> Fred Zemke wrote:
>
>> For your consideration,
>> the attached paper presents a formal semantics for SPARQL
>> that is based on mapping rather than entailment.
>> Fred
>
>
> I was a bit confused by the 'bindable' terminology - am I right in 
> reading you your document as saying that blank nodes, in queries, are 
> not be bound to a graph term, as their existential.
>
> Test case:
>
> Data:
>
> :x :p 1 .
> :y :p 1 .
>
> Query:
> SELECT ?x { [] :p ?x }
>
> has one solution ?x/1 doesn't it (and implementation terms it is much 
> like the same with blank node treated as regular variables then a 
> partial term-distinct being applied over the BGP match ; it's 
> implementing this partial-distinct that worries me if we care about a 
> moderately direct 1-1 mapping to some form of SQL).

In this example, let G be a table of three columns (subj, verb and obj) 
whose rows
are the triples of the default graph.  A query to find all mappings of 
the one variable ?x and
the one blank node identifier _:a satisfying the sample query is

SELECT G.subj AS "_:a", G.obj AS "?x"
FROM G
WHERE G.verb = ':p'

(actually the translation of the prefix ':' should be substituted before 
submitting this query).
Then to factor out the bindings of blank node identifiers, use an outer 
query block
with DISTINCT and a list of all variables (in this example, ?x):

SELECT DISTINCT SQ."?x"
FROM (SELECT G.subj AS "_:a", G.obj AS "?x"
              FROM G
              WHERE G.verb = ':p') AS SQ

So it looks straightforward to get SQL to do the partial distinct.
If you can write an SQL query to perform the raw SPARQL, including the 
bindings of
blank node identifiers, then you can nest it in a subquery and use an 
outer query
to factor our the bindings of blank node identifiers.

>
>
> What do you expect from this example (I know we are not that we are 
> covering aggregation but I'm interested in your view):
>
> Data: as above
>
> Query:
> SELECT sum(?x) { [] :p ?x }
>
> Is it 1 or 2?

If we take SQL as a guide, then I expect the sum to aggregate the rows that
are returned by the WHERE clause.  With the semantics proposed in my 
latest paper,
the WHERE clause returns a single row, ?x -> 1, so the sum would be 1.

On the other hand, if the user rewrites the query as

SELECT sum(?x) WHERE { ?w :p ?x }

then my paper would say the WHERE clause returns two rows, both of
which have ?x -> 1, and so sum(?x) is 2.

>
>
> This example shows me that treating blank nodes as pure existentials, 
> when the entailment regime can provide bindings, will given 
> counter-intuitive results, especially where the graph pattern though 
> of as a "graph with holes" (making blank nodes as in unlabelled nodes 
> of the RDF syntax) quite naturally bindings.

I think I agree with you that there is a concern that
users will find it non-intuitive to factor out the bindings of
blank node identifiers to give them an existential semantic.  It is more 
straightforward
to have a mapping semantic that does not do the factoring.  In that case 
the blank
node identifiers function just like variables.  If we want the ability 
to reduce the solution
sequence by an existential factoring, I would prefer a syntax that did 
not rely on the difference
between _:a and ?x to express it.  Instead, I personally would prefer to 
see explicit
existential quantifiers added to the language, so that one could write, 
for example

SELECT ?x WHERE { (exists ?y) { ?y :p ?x } }

>
> On the spanning blank nodes,
> we could remove the problem by requiring labels do not span BGPs.  
> Then blank node labels can't appear in two different BGPs.  

True but ugly from a usability standpoint.  I believe that users want to 
build
their queries incrementally, and a natural starting point is to create a 
query with
no OPTIONAL or UNION, get that working, and then start editing it.
As soon you add an OPTIONAL or UNION, you need curly braces and
new BGPs, and then the user will be forced to do a lot of collateral editing
of his blank node identifiers, which will create a lot of user frustration.
I think that if SPARQL is specified so that it is unfriendly to incremental
development, this will be a major obstacle to adoption. 
I think that if we have blank node identifiers at all, then we have to 
make their
scope be as large as possible, preferably the whole query, though it might
be tolerable to have some limitation regarding GRAPH graph patterns.

> Alternatively, no blank nodes with labels at all and have "??x" - if 
> we want existential variables, maybe an explicit syntax for them such 
> as ??x (double question mark) might be better.

I personally think the best syntactic option for existentials would be 
to follow first order
predicate calculus and have an explicit quantifier syntax with explicit 
scope,
as illustrated above.

Fred

>
> It's only the use of [] from N3/Turtle that complicates this.   It's 
> the syntax issue about [] that introduced them - it would be a shame 
> to loose that convenience form when it has a reasonable meaning for 
> the problem space covered by the charter.  An extreme approach would 
> be to make [] translate to named variables, just the system gets to 
> choose the names so they don't clash with any named variables and 
> don't appear in "SELECT *"
>
>     Andy
>
>
Received on Tuesday, 22 August 2006 00:26:10 UTC