- From: Fred Zemke <fred.zemke@oracle.com>
- Date: Mon, 21 Aug 2006 17:24:34 -0700
- To: andy.seaborne@hp.com
- CC: public-rdf-dawg@w3.org
Continuing my reply to Andy message below:
Seaborne, Andy wrote:
>
>
> Fred Zemke wrote:
>
>> For your consideration,
>> the attached paper presents a formal semantics for SPARQL
>> that is based on mapping rather than entailment.
>> Fred
>
>
> I was a bit confused by the 'bindable' terminology - am I right in
> reading you your document as saying that blank nodes, in queries, are
> not be bound to a graph term, as their existential.
>
> Test case:
>
> Data:
>
> :x :p 1 .
> :y :p 1 .
>
> Query:
> SELECT ?x { [] :p ?x }
>
> has one solution ?x/1 doesn't it (and implementation terms it is much
> like the same with blank node treated as regular variables then a
> partial term-distinct being applied over the BGP match ; it's
> implementing this partial-distinct that worries me if we care about a
> moderately direct 1-1 mapping to some form of SQL).
In this example, let G be a table of three columns (subj, verb and obj)
whose rows
are the triples of the default graph. A query to find all mappings of
the one variable ?x and
the one blank node identifier _:a satisfying the sample query is
SELECT G.subj AS "_:a", G.obj AS "?x"
FROM G
WHERE G.verb = ':p'
(actually the translation of the prefix ':' should be substituted before
submitting this query).
Then to factor out the bindings of blank node identifiers, use an outer
query block
with DISTINCT and a list of all variables (in this example, ?x):
SELECT DISTINCT SQ."?x"
FROM (SELECT G.subj AS "_:a", G.obj AS "?x"
FROM G
WHERE G.verb = ':p') AS SQ
So it looks straightforward to get SQL to do the partial distinct.
If you can write an SQL query to perform the raw SPARQL, including the
bindings of
blank node identifiers, then you can nest it in a subquery and use an
outer query
to factor our the bindings of blank node identifiers.
>
>
> What do you expect from this example (I know we are not that we are
> covering aggregation but I'm interested in your view):
>
> Data: as above
>
> Query:
> SELECT sum(?x) { [] :p ?x }
>
> Is it 1 or 2?
If we take SQL as a guide, then I expect the sum to aggregate the rows that
are returned by the WHERE clause. With the semantics proposed in my
latest paper,
the WHERE clause returns a single row, ?x -> 1, so the sum would be 1.
On the other hand, if the user rewrites the query as
SELECT sum(?x) WHERE { ?w :p ?x }
then my paper would say the WHERE clause returns two rows, both of
which have ?x -> 1, and so sum(?x) is 2.
>
>
> This example shows me that treating blank nodes as pure existentials,
> when the entailment regime can provide bindings, will given
> counter-intuitive results, especially where the graph pattern though
> of as a "graph with holes" (making blank nodes as in unlabelled nodes
> of the RDF syntax) quite naturally bindings.
I think I agree with you that there is a concern that
users will find it non-intuitive to factor out the bindings of
blank node identifiers to give them an existential semantic. It is more
straightforward
to have a mapping semantic that does not do the factoring. In that case
the blank
node identifiers function just like variables. If we want the ability
to reduce the solution
sequence by an existential factoring, I would prefer a syntax that did
not rely on the difference
between _:a and ?x to express it. Instead, I personally would prefer to
see explicit
existential quantifiers added to the language, so that one could write,
for example
SELECT ?x WHERE { (exists ?y) { ?y :p ?x } }
>
> On the spanning blank nodes,
> we could remove the problem by requiring labels do not span BGPs.
> Then blank node labels can't appear in two different BGPs.
True but ugly from a usability standpoint. I believe that users want to
build
their queries incrementally, and a natural starting point is to create a
query with
no OPTIONAL or UNION, get that working, and then start editing it.
As soon you add an OPTIONAL or UNION, you need curly braces and
new BGPs, and then the user will be forced to do a lot of collateral editing
of his blank node identifiers, which will create a lot of user frustration.
I think that if SPARQL is specified so that it is unfriendly to incremental
development, this will be a major obstacle to adoption.
I think that if we have blank node identifiers at all, then we have to
make their
scope be as large as possible, preferably the whole query, though it might
be tolerable to have some limitation regarding GRAPH graph patterns.
> Alternatively, no blank nodes with labels at all and have "??x" - if
> we want existential variables, maybe an explicit syntax for them such
> as ??x (double question mark) might be better.
I personally think the best syntactic option for existentials would be
to follow first order
predicate calculus and have an explicit quantifier syntax with explicit
scope,
as illustrated above.
Fred
>
> It's only the use of [] from N3/Turtle that complicates this. It's
> the syntax issue about [] that introduced them - it would be a shame
> to loose that convenience form when it has a reasonable meaning for
> the problem space covered by the charter. An extreme approach would
> be to make [] translate to named variables, just the system gets to
> choose the names so they don't clash with any named variables and
> don't appear in "SELECT *"
>
> Andy
>
>
Received on Tuesday, 22 August 2006 00:26:10 UTC