- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Tue, 29 May 2007 10:14:44 -0400
- To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
- Cc: public-rdf-dawg-comments@w3.org
- Message-ID: <20070529141444.GX13419@w3.org>
Peter, thank you for your comments. This reply includes reponses to substantive comments 1, 3, 8 and 9. >1/ A question on the basic notion of RDF. > >>From the document: > >Abstract: RDF is a directed, labeled graph data format for representing >information in the Web. > >>From RDF Concepts: > >Abstract: The Resource Description Framework (RDF) is a framework for >representing information in the Web. > >How can these two different, to me, views of RDF be reconciled? If >SPARQL treats RDF as simply a "graph data format" then what is the >status of the RDF semantics, which goes much further? Suppose I write >a system for handling RDF that respects the RDF recommendations, is this >system going to be useful for SPARQL? For example, if I store RDF >graphs in some internal canonical form (for example, changing >"042"^^xsd:integer to "42"^^xsd:integer) then I have changed the SPARQL >answers. While there are many choices the WG could have made here, the charter specified that the SPARQL be defined in terms of a graph, http://www.w3.org/2003/12/swa/dawg-charter#scope [[ The principal task of the RDF Data Access Working Group is to gather requirements and to define an HTTP and/or SOAP-based protocol for selecting instances of subgraphs from an RDF graph. ]] and that the entailments of various semantics be reflected in the graph. http://www.w3.org/2003/12/swa/dawg-charter#rdfs-owl-queries [[ The protocol will allow access to a notional RDF graph. This may in practice be the virtual graph which would follow from some form of inference from a stored graph. This does not affect the data access protocol, but may affect the description of the data access service. For example, if OWL DL semantics are supported by a service, that may be evident in the description of the service or the virtual graph which is queried, but it will not affect the protocol designed under this charter. ]] In this regard, SPARQL's graph matching corresponds to Graph Equivalence in section 6.3 of the Concepts and Abstract Syntax. http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-graph-equality [[ Two RDF graphs G and G' are equivalent if there is a bijection M between the sets of nodes of the two graphs, such that: 1. M maps blank nodes to blank nodes. 2. M(lit)=lit for all RDF literals lit which are nodes of G. 3. M(uri)=uri for all RDF URI references uri which are nodes of G. 4. The triple ( s, p, o ) is in G if and only if the triple ( M(s), p, M(o) ) is in G' With this definition, M shows how each blank node in G can be replaced with a new blank node to give G'. ]] With this in mind, SPARQL is defined for simple entailment. From the Status of the Document: [[ Compared to previous versions, this document adds an algebra for SPARQL that provides semantics for evaluating graph patterns and solution modifiers. This algebra is defined over basic graph pattern matching for simple entailment. This document also provides conditions for extending SPARQL basic graph pattern matching. ]] The document does anticipate richer semantics, but the Working Group notes that querying richer semantic models is an open research problem, and has declined to standardize beyond simple entailment at this time: [[ 12.6 Extending SPARQL Basic Graph Matching The overall SPARQL design can be used for queries which assume a more elaborate form of entailment than simple entailment, by re-writing the matching conditions for basic graph patterns. Since it is an open research problem to state such conditions in a single general form which applies to all forms of entailment and optimally eliminates needless or inappropriate redundancy, this document only gives necessary conditions which any such solution should satisfy. These will need to be extended to full definitions for each particular case. ]] As you observed a while back ( http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Sep/0036.html ), this behavior is in line with the group's charter: [[ ISSUE 1 (show-stopper): Non-respect for RDF Semantics I realize that this interacts with the charter of the group. What this means to me is that the charter is ill-formed with respect to RDF. ]] > 3/ Matching literals > > I was very surprised to see that the exact literal form of an RDF > literal is significant (Section 2.3.3). Imagine what would happen if an > SQL query depended on the exact literal form in which numbers were > entered into a database! SPARQL is defined for simple entailment as noted above. As per 6.5.1 of RDF concepts: http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal [[ 6.5.1 Literal Equality Two literals are equal if and only if all of the following hold: * The strings of the two lexical forms compare equal, character by character. * Either both or neither have language tags. * The language tags, if any, compare equal. * Either both or neither have datatype URIs. * The two datatype URIs, if any, compare equal, character by character. ]] SPARQL does accomodate value matching with FILTER expressions, which are defined by XPath Functions and Operators. RDF semantics talks about datatyped interpretations and D-entailment: http://www.w3.org/TR/rdf-mt/#dtype_interp Based on current implementation practice, the working group decided to leave D-entailment as a research problem. This includes richer semantics for graph patterns such as: [] rdf:type xsd:nonNegativeInteger . The Working Group has approved tests in this space. See, for example: http://www.w3.org/2001/sw/DataAccess/tests/#open-eq-01 (But note that that's in the old test space and will be moving in the next few weeks.) > 8/ Basic Definition of SPARQL > > The definition of Solution Sequence is inadequately grounded. A > Solution Sequence is defined as "a list of solutions, possibly > unordered" (Section 12.1.6). The common formal definitions of lists > depend on an ordering. If SPARQL is using some other definition, then > this other definition must be at least referenced. The terminology used > to refer to solutions is much to varied. It includes at least > sequences, lists, unordered collections, multisets, sets. > > ToList "turns a multiset into a sequence, with the same elements and > cardinality" (Section 12.2.3). Aside from the question about > cardinality of what, this is not a functional mapping, as there are many > sequences that could correspond to a multiset (or set) if the order of > the sequence is ignored. The formal definition of ToList implicitly > mentions this non-functionality. > > Given that ToList is a fundamental part of the definition of SPARQL it > requires a better definition. Further, there needs to be proofs that > the choice in ToList does not make a difference anywhere in SPARQL, for > example, in further processing It is not intended to be functional. The specification leaves the order of the sequence generated by ToList unspecified. > The definition of SPARQL BGP mapping importantly depends on the order > that the RDF instance mapping and solution mapping are performed. This > should be documented. The Working Group does not believe this to be the case, but we'd welcome a test case/example that demonstrates the difference. > The definition of BGP Matching is not specified in the document. The > definition in Section 12.3.1 defines a "solution" reasonably, although > presumably mu is *the* "restriction of P to the query variables in BGP. > However, the last bit of the definition doesn't make sense? What is > omega there? What is mu there? What is theta? What is mu(theta)? > Where then is the definition of the match of a BGP against an RDF graph? The last part of the definition says that the cardinality of a solution (mu) in a multiset of solution mappings (omega) is the number of distinct RDF instance mappings (sigma) that compose with mu to give a pattern instance mapping (P) that, when applied to the BGP, produces a subset of G. > Section 12.5 does not provide the missing glue, as it just defers to > Section 12.3.1. Section 12.5 doesn't even get to a BGP and an RDF > graph. > > What do the [ ] and { } notations mean in Section 12.4? { } is standard set notation. The card[...] notation is introduced at the beginning of 12.3. > 9/ A Fundamental Disagreement on SPARQL > > I still object to the fact that SPARQL can produce different results for > equivalent RDF graphs, as described in Section 12.3.2. We have recorded your previous objection to this as part of our rdfSemantics issue: http://www.w3.org/2001/sw/DataAccess/issues#rdfSemantics The Working Group has considered alternatives; Bijan Paria described forms of leanness with respect to the data graph and the subgraph matching a query pattern. http://www.w3.org/mid/DF0BA59F-7E8C-4CDB-BF2C-C391D05CEB4D@cs.man.ac.uk The results of a SPARQL query with respect to a given data graph are defined, and specifically do not include leaning the matching subgraph. SPARQL neither prohibits nor requires the reduction of equivalent graphs to the minimal entailment Most current implementations do not lean the input graph. The WG consensus was not to impose this on implementations, noting significant efficiency and scaling costs. Coupled with extension to other entailment systems, the fundamental problem is outside current practice. Implementations are free to process data before exposing it (e.g. apply leaning). > > Peter F. Patel-Schneider > Bell Labs Research > > > > > From: "Eric Prud'hommeaux" <eric@w3.org> > Subject: Re: comments on Section 1 and Section 2 of SPARQL Query Language for RDF > Date: Thu, 17 May 2007 17:43:34 -0700 > > > The Data Access Working Group is ready to bring SPARQL Query to > > Candidate Recommendation. The objections posted by Peter F. > > Patel-Schneider pertain to parts of the language that have changed > > since the last CR transition. We hope PFPS will agree to the language > > changes, withdraw his objection, and help us with editorial updates > > during the Candidate Recommendation phase. > > > > Dear Peter, > > > > It has been 15 months since your comments, and we have reorganized the > > document substantially, hopefully in ways that address your comments. > > (Please see section 12 to see the aggregated definitions and note that > > section 2 is now informative.) I have responded to many of your > > comments with "[gone]". Others are marked with "[definitions > > replaced]". These annotations are sprinkled throught this reply with > > the goal of responding to each comment. > > > > I have drafted text to address your editorial comments and will > > propose it to the working group after the transition to CR. None of > > these changes affect the semantics of the query language as understood > > by the working group. > > > > There have been some changes to the entailment regime in the past > > year. Your technical comments (both numbered C2.39) should be > > addressed by the new semantics. If you wish to persue either the > > editorial or technical comments, we should split out the thread as > > the distinction is important to the W3C publication process. > > -- -eric office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA mobile: +1.617.599.3509 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Tuesday, 29 May 2007 14:14:52 UTC