Replies to substantive comments on SPARQL Query Language for RDF from Eric Prud'hommeaux on 2007-05-29 (public-rdf-dawg-comments@w3.org from May 2007)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Tue, 29 May 2007 10:14:44 -0400
To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
Cc: public-rdf-dawg-comments@w3.org
Message-ID: <20070529141444.GX13419@w3.org>
Peter, thank you for your comments. This reply includes reponses to
substantive comments 1, 3, 8 and 9.

>1/ A question on the basic notion of RDF.
>
>>From the document: 
>
>Abstract: RDF is a directed, labeled graph data format for representing
>information in the Web. 
>
>>From RDF Concepts:
>
>Abstract: The Resource Description Framework (RDF) is a framework for
>representing information in the Web. 
>
>How can these two different, to me, views of RDF be reconciled?  If
>SPARQL treats RDF as simply a "graph data format" then what is the
>status of the RDF semantics, which goes much further?   Suppose I write
>a system for handling RDF that respects the RDF recommendations, is this
>system going to be useful for SPARQL?  For example, if I store RDF
>graphs in some internal canonical form (for example, changing
>"042"^^xsd:integer to "42"^^xsd:integer) then I have changed the SPARQL
>answers.

While there are many choices the WG could have made here, the charter
specified that the SPARQL be defined in terms of a graph, 

  http://www.w3.org/2003/12/swa/dawg-charter#scope
  [[
  The principal task of the RDF Data Access Working Group is to gather
  requirements and to define an HTTP and/or SOAP-based protocol for
  selecting instances of subgraphs from an RDF graph.
  ]]

and that the entailments of various semantics be reflected in the
graph.

  http://www.w3.org/2003/12/swa/dawg-charter#rdfs-owl-queries
  [[
  The protocol will allow access to a notional RDF graph. This may in
  practice be the virtual graph which would follow from some form of
  inference from a stored graph. This does not affect the data access
  protocol, but may affect the description of the data access
  service. For example, if OWL DL semantics are supported by a
  service, that may be evident in the description of the service or
  the virtual graph which is queried, but it will not affect the
  protocol designed under this charter.
  ]]

In this regard, SPARQL's graph matching corresponds to Graph
Equivalence in section 6.3 of the Concepts and Abstract Syntax.

 
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-graph-equality
  [[
  Two RDF graphs G and G' are equivalent if there is a bijection M
  between the sets of nodes of the two graphs, such that:

   1. M maps blank nodes to blank nodes.
   2. M(lit)=lit for all RDF literals lit which are nodes of G.
   3. M(uri)=uri for all RDF URI references uri which are nodes of G.
   4. The triple ( s, p, o ) is in G if and only if the triple ( M(s), p, 
M(o) ) is in G'

  With this definition, M shows how each blank node in G can be
  replaced with a new blank node to give G'.
  ]]

With this in mind, SPARQL is defined for simple entailment. From the 
Status of the Document:

[[
Compared to previous versions, this document adds an algebra for SPARQL 
that 
provides semantics for evaluating graph patterns and solution modifiers. 
This 
algebra is defined over basic graph pattern matching for simple 
entailment. 
This document also provides conditions for extending SPARQL basic graph 
pattern matching.
]]

The document does anticipate richer semantics, but the Working Group notes 
that querying richer semantic models is an open research problem, and has 
declined to standardize beyond simple entailment at this time:

[[
12.6 Extending SPARQL Basic Graph Matching

The overall SPARQL design can be used for queries which assume a more 
elaborate form of entailment than simple entailment, by re-writing the 
matching conditions for basic graph patterns. Since it is an open research 

problem to state such conditions in a single general form which applies to 
all 
forms of entailment and optimally eliminates needless or inappropriate 
redundancy, this document only gives necessary conditions which any such 
solution should satisfy. These will need to be extended to full 
definitions 
for each particular case.
]]

As you observed a while back ( 
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Sep/0036.html 
), this behavior is in line with the group's charter:

[[
ISSUE 1 (show-stopper):  Non-respect for RDF Semantics

I realize that this interacts with the charter of the group.  What this 
means
to me is that the charter is ill-formed with respect to RDF.
]]


> 3/ Matching literals
> 
> I was very surprised to see that the exact literal form of an RDF
> literal is significant (Section 2.3.3).  Imagine what would happen if an
> SQL query depended on the exact literal form in which numbers were
> entered into a database!

SPARQL is defined for simple entailment as noted above. As per 6.5.1 of
RDF concepts:

http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal
[[
6.5.1 Literal Equality

Two literals are equal if and only if all of the following hold:

     * The strings of the two lexical forms compare equal, character by
character.
     * Either both or neither have language tags.
     * The language tags, if any, compare equal.
     * Either both or neither have datatype URIs.
     * The two datatype URIs, if any, compare equal, character by
character.
]]

SPARQL does accomodate value matching with FILTER expressions, which are
defined by XPath Functions and Operators.

RDF semantics talks about datatyped interpretations and D-entailment:
http://www.w3.org/TR/rdf-mt/#dtype_interp

Based on current implementation practice, the working group decided to
leave D-entailment as a research problem. This includes richer semantics
for graph patterns such as:

  [] rdf:type xsd:nonNegativeInteger .

The Working Group has approved tests in this space. See, for example:

  http://www.w3.org/2001/sw/DataAccess/tests/#open-eq-01

(But note that that's in the old test space and will be moving in the next
few weeks.)


> 8/ Basic Definition of SPARQL
> 
> The definition of Solution Sequence is inadequately grounded.  A
> Solution Sequence is defined as "a list of solutions, possibly
> unordered" (Section 12.1.6).  The common formal definitions of lists
> depend on an ordering.  If SPARQL is using some other definition, then
> this other definition must be at least referenced.  The terminology used
> to refer to solutions is much to varied.  It includes at least
> sequences, lists, unordered collections, multisets, sets.
> 
> ToList "turns a multiset into a sequence, with the same elements and
> cardinality" (Section 12.2.3).  Aside from the question about
> cardinality of what, this is not a functional mapping, as there are many
> sequences that could correspond to a multiset (or set) if the order of
> the sequence is ignored.  The formal definition of ToList implicitly
> mentions this non-functionality.
> 
> Given that ToList is a fundamental part of the definition of SPARQL it
> requires a better definition.  Further, there needs to be proofs that
> the choice in ToList does not make a difference anywhere in SPARQL, for
> example, in further processing 

It is not intended to be functional. The specification leaves the order of 
the sequence generated by ToList unspecified.

> The definition of SPARQL BGP mapping importantly depends on the order
> that the RDF instance mapping and solution mapping are performed.  This
> should be documented.

The Working Group does not believe this to be the case, but we'd welcome a 
test case/example that demonstrates the difference.
 
> The definition of BGP Matching is not specified in the document.  The
> definition in Section 12.3.1 defines a "solution" reasonably, although
> presumably mu is *the* "restriction of P to the query variables in BGP.
> However, the last bit of the definition doesn't make sense?  What is
> omega there?  What is mu there?  What is theta?  What is mu(theta)?
> Where then is the definition of the match of a BGP against an RDF graph?

The last part of the definition says that the cardinality of a solution 
(mu) in a multiset of solution mappings (omega) is the number of distinct 
RDF instance mappings (sigma) that compose with mu to give a pattern 
instance mapping (P) that, when applied to the BGP, produces a subset of 
G.

> Section 12.5 does not provide the missing glue, as it just defers to
> Section 12.3.1.  Section 12.5 doesn't even get to a BGP and an RDF
> graph.
> 
> What do the [ ] and { } notations mean in Section 12.4?

{ } is standard set notation.

The card[...] notation is introduced at the beginning of 12.3.


> 9/ A Fundamental Disagreement on SPARQL
> 
> I still object to the fact that SPARQL can produce different results for
> equivalent RDF graphs, as described in Section 12.3.2.

We have recorded your previous objection to this as part of our 
rdfSemantics issue:

  http://www.w3.org/2001/sw/DataAccess/issues#rdfSemantics


The Working Group has considered alternatives; Bijan Paria described
forms of leanness with respect to the data graph and the subgraph
matching a query pattern.
  http://www.w3.org/mid/DF0BA59F-7E8C-4CDB-BF2C-C391D05CEB4D@cs.man.ac.uk

The results of a SPARQL query with respect to a given data graph are
defined, and specifically do not include leaning the matching
subgraph. SPARQL neither prohibits nor requires the reduction of
equivalent graphs to the minimal entailment

Most current implementations do not lean the input graph. The WG
consensus was not to impose this on implementations, noting
significant efficiency and scaling costs. Coupled with extension to
other entailment systems, the fundamental problem is outside current
practice. Implementations are free to process data before exposing it
(e.g. apply leaning).


> 
> Peter F. Patel-Schneider
> Bell Labs Research
> 
> 
> 
> 
> From: "Eric Prud'hommeaux" <eric@w3.org>
> Subject: Re: comments on Section 1 and Section 2 of SPARQL Query Language for RDF
> Date: Thu, 17 May 2007 17:43:34 -0700
> 
> > The Data Access Working Group is ready to bring SPARQL Query to
> > Candidate Recommendation. The objections posted by Peter F.
> > Patel-Schneider pertain to parts of the language that have changed
> > since the last CR transition. We hope PFPS will agree to the language
> > changes, withdraw his objection, and help us with editorial updates
> > during the Candidate Recommendation phase.
> > 
> > Dear Peter,
> > 
> > It has been 15 months since your comments, and we have reorganized the
> > document substantially, hopefully in ways that address your comments.
> > (Please see section 12 to see the aggregated definitions and note that
> > section 2 is now informative.) I have responded to many of your
> > comments with "[gone]". Others are marked with "[definitions
> > replaced]". These annotations are sprinkled throught this reply with
> > the goal of responding to each comment.
> > 
> > I have drafted text to address your editorial comments and will
> > propose it to the working group after the transition to CR. None of
> > these changes affect the semantics of the query language as understood
> > by the working group.
> > 
> > There have been some changes to the entailment regime in the past
> > year. Your technical comments (both numbered C2.39) should be
> > addressed by the new semantics. If you wish to persue either the
> > editorial or technical comments, we should split out the thread as
> > the distinction is important to the W3C publication process.
> 
> 

-- 
-eric

office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
mobile: +1.617.599.3509

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Tuesday, 29 May 2007 14:14:52 UTC