comments on SPARQL Query Language for RDF from Peter F. Patel-Schneider on 2007-05-22 (public-rdf-dawg-comments@w3.org from May 2007)

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Tue, 22 May 2007 14:08:13 -0400 (EDT)
To: public-rdf-dawg-comments@w3.org
Cc: eric@w3.org
Message-Id: <20070522.140813.75957604.pfps@research.bell-labs.com>
	Comments on SPARQL Query Language for RDF
	W3C Working Draft 26 March 2007

Well, the document has certainly changed since the last time I reviewed
it, so there is little point of going over my comments from before.

This is true EXCEPT for one thing.  Many of my comments from before
complained about lack of rigour in the document.  Unfortunately, I have
noticed a continued lack of rigour in many of the basic notions and
definitions underlying SPARQL.

Because of problems described in 8/ below, I do not believe that the
document is adequate to progress to the next stage of the W3C process,
even without my fundamental disagreement with the treatment of the
meaning of RDF graphs in SPARQL (3/ and 9/ below).


[Note that this is not a complete review of the document.  I have only
looked at some of the informative material and enough of the formal
definitions to see that I cannot progress further without some more
information.]


1/ A question on the basic notion of RDF.

>From the document: 

Abstract: RDF is a directed, labeled graph data format for representing
information in the Web. 

>From RDF Concepts:

Abstract: The Resource Description Framework (RDF) is a framework for
representing information in the Web. 

How can these two different, to me, views of RDF be reconciled?  If
SPARQL treats RDF as simply a "graph data format" then what is the
status of the RDF semantics, which goes much further?   Suppose I write
a system for handling RDF that respects the RDF recommendations, is this
system going to be useful for SPARQL?  For example, if I store RDF
graphs in some internal canonical form (for example, changing
"042"^^xsd:integer to "42"^^xsd:integer) then I have changed the SPARQL
answers.


2/ What is a sequence?  What exactly are the results of a SPARQL query?

I have always thought that a sequence was inherently ordered?  How then
can a the "result of a query [be] a solution sequence" as well as a
"result set" (Section 2.2).

This is particularly glaring at the beginning of Section 9.


3/ Matching literals

I was very surprised to see that the exact literal form of an RDF
literal is significant (Section 2.3.3).  Imagine what would happen if an
SQL query depended on the exact literal form in which numbers were
entered into a database!


4/ Labeled blank nodes

What is a labeled blank node (Section 2.4)?  Is it just a blank node, or
is it something else?


5/ Syntactic shorthand and other short forms

Are all the syntactic short forms simply sugar for their long forms, or
is there different relationships between 1 and "1"^^xsd:integer (Section
4.1.2) and the short forms in Section 4.2.  There are multiple wordings
for expressing the short forms, including "the same as" (Section 4.1.2),
"is equivalent to" (Section 4.1.4), and "syntactic sugar" (Section
4.2.3). 


6/ Status of Section 4

Section 4 (SPARQL Syntax) is not labeled as informative.  However, it
does not exhaustively cover the grammar of SPARQL.  For example,
NumericLiteral is defined but not used in Section 4.


7/ union

Why is union often written out (e.g., Section 1.2.14)?


8/ Basic Definition of SPARQL

The definition of Solution Sequence is inadequately grounded.  A
Solution Sequence is defined as "a list of solutions, possibly
unordered" (Section 12.1.6).  The common formal definitions of lists
depend on an ordering.  If SPARQL is using some other definition, then
this other definition must be at least referenced.  The terminology used
to refer to solutions is much to varied.  It includes at least
sequences, lists, unordered collections, multisets, sets.

ToList "turns a multiset into a sequence, with the same elements and
cardinality" (Section 12.2.3).  Aside from the question about
cardinality of what, this is not a functional mapping, as there are many
sequences that could correspond to a multiset (or set) if the order of
the sequence is ignored.  The formal definition of ToList implicitly
mentions this non-functionality.

Given that ToList is a fundamental part of the definition of SPARQL it
requires a better definition.  Further, there needs to be proofs that
the choice in ToList does not make a difference anywhere in SPARQL, for
example, in further processing 

The definition of SPARQL BGP mapping importantly depends on the order
that the RDF instance mapping and solution mapping are performed.  This
should be documented.

The definition of BGP Matching is not specified in the document.  The
definition in Section 12.3.1 defines a "solution" reasonably, although
presumably mu is *the* "restriction of P to the query variables in BGP.
However, the last bit of the definition doesn't make sense?  What is
omega there?  What is mu there?  What is theta?  What is mu(theta)?
Where then is the definition of the match of a BGP against an RDF graph?

Section 12.5 does not provide the missing glue, as it just defers to
Section 12.3.1.  Section 12.5 doesn't even get to a BGP and an RDF
graph.

What do the [ ] and { } notations mean in Section 12.4?


9/ A Fundamental Disagreement on SPARQL

I still object to the fact that SPARQL can produce different results for
equivalent RDF graphs, as described in Section 12.3.2.



Peter F. Patel-Schneider
Bell Labs Research




From: "Eric Prud'hommeaux" <eric@w3.org>
Subject: Re: comments on Section 1 and Section 2 of SPARQL Query Language for RDF
Date: Thu, 17 May 2007 17:43:34 -0700

> The Data Access Working Group is ready to bring SPARQL Query to
> Candidate Recommendation. The objections posted by Peter F.
> Patel-Schneider pertain to parts of the language that have changed
> since the last CR transition. We hope PFPS will agree to the language
> changes, withdraw his objection, and help us with editorial updates
> during the Candidate Recommendation phase.
> 
> Dear Peter,
> 
> It has been 15 months since your comments, and we have reorganized the
> document substantially, hopefully in ways that address your comments.
> (Please see section 12 to see the aggregated definitions and note that
> section 2 is now informative.) I have responded to many of your
> comments with "[gone]". Others are marked with "[definitions
> replaced]". These annotations are sprinkled throught this reply with
> the goal of responding to each comment.
> 
> I have drafted text to address your editorial comments and will
> propose it to the working group after the transition to CR. None of
> these changes affect the semantics of the query language as understood
> by the working group.
> 
> There have been some changes to the entailment regime in the past
> year. Your technical comments (both numbered C2.39) should be
> addressed by the new semantics. If you wish to persue either the
> editorial or technical comments, we should split out the thread as
> the distinction is important to the W3C publication process.
Received on Tuesday, 22 May 2007 18:08:06 UTC