Re: Comments on "SPARQL Query Language for RDF" from Enrico Franconi on 2005-09-03 (public-rdf-dawg@w3.org from July to September 2005)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Sat, 3 Sep 2005 18:52:24 +0200
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <78ADDFD2-69C2-465E-9FF6-E9375ACE2948@inf.unibz.it>
1) On 2 Sep 2005, at 12:58, Seaborne, Andy wrote
<http://www.w3.org/mid/43183071.1070508@hp.com>:

 > = subgraph / entailment
 >
 > The RDF MT defines three kinds of entailment - simple, RDF and RDFS.
 > RDF and RDFS are examples of vocabulary entailment.
 >
 > SPARQL basic patterns are defined to match by subgraph - the graph
 > being matched against contains RDF and can have some level of
 > entailment applied or not.  Your first example misses this because
 > you show the data, without a declaration of the entailment to be
 > applied.  The SPARQL query can execute against a simple entailed
 > version or RDF entailed version (or "zero entailment").

OK, then it is necessary to change:

s/subgraph of/entailed by/ in defn of basic pattern
(we have noticed that this change has been done already twice...)

and somewhere there should be the ability to declare the type of
entailment (simple, RDF, RDFS - as defined in RDF-MT).

======

2) On 2 Sep 2005, at 12:58, Seaborne, Andy wrote
<http://www.w3.org/mid/43183071.1070508@hp.com>:

 > = Blank Nodes in query results
 >
 > Blank nodes as distinguished variables can't be returned in SELECT
 > queries. This is by design.  An application should use a named
 > variable if it wants to return the binding in a solution.

Uhu? We said "Blank nodes as binding of distinguished variables",
which are clearly allowed (see, e.g., the example
<http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050721/#BlankNodes>).

So, the problem we mention in the comments remains:
the minimality of answers is not guaranteed, since tuples in the
answer set may be redundant.
In our example we show how two equivalent graphs (i.e., they entail
each other) give different answers to the same query. This is due to
the fact that minimality is not required.

We need to enforce in the definition the minimality of answers. This
may happen if there are bnodes in the result (and \top unbound nodes
in the result - see below).  This can be easily and efficiently
implemented (e.g., just hash the bnodes in the answer whenever you get
them, and check).

======

3) On 2 Sep 2005, at 12:58, Seaborne, Andy wrote
<http://www.w3.org/mid/43183071.1070508@hp.com>:

 > = Blank Nodes in Queries
 >
 > """
 > A blank node in a query pattern “behaves as a variable; a blank node
 > in a query pattern may match any RDF term”.
 > """
 >
 > then the solution of a basic pattern is described in terms of
 > matching variables.  This is supposed to cover the case of bNodes
 > from the query pattern as they are treated as variables and so have
 > bindings.
 >
 > Could you suggest wording that would make that clearer?

We suggest to update the definition of "pattern solution" to include
explicitly the bnodes in addition to the variables.

======

4) About unbound values in the answer.

Statement: unbound values generated by queries with an "optional" part
are different from unbound values generated by unsafe queries.
We suggest to forbid unsafe queries.

Consider a variation of the example by Andy from
<http://www.w3.org/mid/431852CC.5030402@hp.com>:

@prefix foaf:       <http://xmlns.com/foaf/0.1/> .
@prefix rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:       <http://www.w3.org/2000/01/rdf-schema#> .

_:a  rdf:type        foaf:Person .
_:a  foaf:name       "Alice" .
_:a  foaf:mbox       <mailto:alice@example.com> .

_:b  rdf:type        foaf:Person .
_:b  foaf:name       "Bob" .

Consider the following query with an "optional" part:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE  { ?x foaf:name  ?name .
          OPTIONAL { ?x  foaf:mbox  ?mbox }
        }

----------------------------------------
| name    | mbox                       |
========================================
| "Alice" | <mailto:alice@example.com> |
| "Bob"   |                            |
----------------------------------------

Here, the meaning of the unbound value is that *no* RDF term may be
the mbox of Bob. We call this unbound value "\bottom".

On the other hand, consider the unsafe query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE  { ?x foaf:name  ?name }

----------------------------------------
| name    | mbox                       |
========================================
| "Alice" |                            |
| "Bob"   |                            |
----------------------------------------

Here, the meaning of the unbound values in the result is that *any*
RDF term may be in that parts of the answer. We call this type of
unbound value "\top". As a matter of fact, the \top value is just a
shortcut generating an (infinite) answer set that contains any
possible RDF term in place of the \top unbound value.

----------------------------------------
| name    | mbox                       |
========================================
| "Alice" | <mailto:alice@example.com> |
| "Alice" | <mailto:bruno@example.com> |
| "Alice" | "Bob"                      |
          ...
| "Bob"   | <mailto:alice@example.com> |
| "Bob"   | <mailto:bruno@example.com> |
| "Bob"   | "Bob"                      |
          ...
----------------------------------------

Let's now extend the query as in the original example by Andy:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE  {
     { ?x foaf:name  ?name }
     UNION
     { ?x foaf:name  ?name . ?x  foaf:mbox  ?mbox }
   }

----------------------------------------
| name    | mbox                       |
========================================
| "Alice" |                            |
| "Bob"   |                            |
| "Alice" | <mailto:alice@example.com> |
----------------------------------------

Here, both unbound values are \top values, since they came from an
unsafe subquery. However, note that this answer is not minimal,
i.e., it contains a redundant part: in fact, the last row is already
expressed by the first row. This can be understood also by noting that
the first subquery contains completely the second subquery, as
expected for logical reasons.
So, by enforcing minimality (like we do when there are bnodes in the
result, representing existential values), the expected answer is:

----------------------------------------
| name    | mbox                       |
========================================
| "Alice" |                            |
| "Bob"   |                            |
----------------------------------------

Please note that a \bottom value in the result - like in the first
example with the "optional" part - does not lead to any
simplification.

Also note that the "bound" operator is false only in the case of a
\bottom unbound value, while it is true in the case of a \top unbound
value.

Moreover, note that a \top value in a filter construct is really
problematic to handle. Consider an extension of the previous unsafe
query with a filter operating on the \top unbound value (which can not
be catched by a "bound" construct):

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE  { ?x foaf:name  ?name .
           FILTER ?mbox = <mailto:alice@example.com> }

Since the ?mbox variable is unsafe, it may take any possible RDF term
as value. So, the following is the correct answer:

----------------------------------------
| name    | mbox                       |
========================================
| "Alice" | <mailto:alice@example.com> |
| "Bob"   | <mailto:alice@example.com> |
----------------------------------------

Even worst, you may need to generate infinite answer sets:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE  { ?x foaf:name  ?name .
           FILTER ?mbox >= "27"ˆˆxs:decimal }

----------------------------------------
| name    | mbox                       |
========================================
| "Alice" | 27                         |
| "Alice" | 28                         |
| "Alice" | 29                         |
          ...
| "Bob"   | 27                         |
| "Bob"   | 28                         |
| "Bob"   | 29                         |
          ...
----------------------------------------

As a final note, as the following query shows, it is really necessary
to distinguish explicitly between \top and \bottom unbound values,
since they may appear in the same result:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE  {
     { ?x foaf:name  ?name }
     UNION
     { ?x foaf:name  ?name .
          OPTIONAL { ?x  foaf:mbox  ?mbox }
        }
   }

----------------------------------------
| name    | mbox                       |
========================================
| "Alice" |                            |
| "Bob"   |                            |
| "Alice" | <mailto:alice@example.com> |
| "Bob"   |                            |
----------------------------------------

This result has the following meaning:

----------------------------------------
| name    | mbox                       |
========================================
| "Alice" | \top                       |
| "Bob"   | \top                       |
| "Alice" | <mailto:alice@example.com> |
| "Bob"   | \bottom                    |
----------------------------------------


Our strong suggestion is to forbid unsafe queries completely, so that
the \top unbound value will never appear. This requires a precise
syntactical definition of safe queries. This restriction is customary
in any database query language.
If you don't want to forbid unsafe queries, we guess that you have to
be ready to deal with the cases mentioned above.

======

5) We understand now the difference between optional and union (thanks
    to Andy's example in
    <http://www.w3.org/mid/1125667385.16011.761.camel@dirk>).

    New observation: as it is currently defined, the "optional"
    construct makes the query language *non-monotonic*; i.e., by adding
    triples to the RDF data the answer set to a query may decrease. The
    logic becomes intrinsically harder.

======

6) We would still like to give a name to a simpler sublanguage, which
    should have a clear semantics - and therefore all the
    implementations should agree on it. We propose to call "Rich
    SPARQL" the current language, and "SPARQL" the language without
    "description of resources" and without "specification and query of
    RDF datasets" (that is, the provenance issue), since everybody
    acknowledges that there are very serious semantic problems with
    those constructs.

    In fact, we agree with Dan's comment in
    <http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/ 
2005Sep/0008.html>:

    > I suppose we could make explicit that for a query pattern P, if S
    > is a solution w.r.t. an input graph G, then S(P) is entailed by
    > G. Is that what you have in mind?
    >
    > I think the idea can be expanded to cover UNION
    > straightforwardly, and perhaps OPTIONAL with some effort, but I
    > don't know how this applies to queries that use the GRAPH
    > keyword.

    We believe that a nice work can be done for what we called above
    SPARQL, but not for Rich SPARQL. And we volunteer to do it (see [1]
    for our first attempt). The current document is definitely not
    enough to give a precise account of the semantics of SPARQL. In
    principle, a new document on the semantic of SPARQL should become
    somehow official for W3C.


cheers
-enrico+sergio

[1] <http://www.inf.unibz.it/krdb/w3c/rdf-sparql-semantics.pdf>


Enrico Franconi                  - franconi@inf.unibz.it
Free University of Bozen-Bolzano - http://www.inf.unibz.it/~franconi/
Faculty of Computer Science      - Phone: (+39) 0471-016-120
I-39100 Bozen-Bolzano BZ, Italy  - Fax:   (+39) 0471-016-129
Received on Saturday, 3 September 2005 16:52:37 UTC