- From: Fred Zemke <fred.zemke@oracle.com>
- Date: Wed, 02 Aug 2006 12:24:46 -0700
- To: public-rdf-dawg@w3.org
Master list of my comments In the telecon on 1 Aug 2006, I was asked to state which of my comments I regard as closed. I also learned about the issue list. Following up on the telecon, I have assembled all my comments in this message (a very long one, I'm afraid) and interspersed remarks (beginning with ++) on subsequent developments, in many cases stating that the matter is closed in my mind. As for what is left, I would like to get the important issues onto the issue list. Here is my proposed additions to the issues list: 1. Material on entailment and general framework needs to be rewritten. One objective of the rewrite is to extend the scope of blank node identifiers to include FILTER clause in rule [21] FilteredBasicGraphPattern. 2. Should we rearrange rule [14] SolutionModifier to place OffsetClause before LimitClause, given that the OffsetClause is processed first. 3. Should items in the SELECT list be separated by commas. I have heard that this is part of an existing issue on punctuation which is being re-opened. 4. Duplicates from UNION: do we require a result sequence to have a precise count of duplicates, or is it more lax? 5. The domain of solutions is not clearly specified. This is particularly an issue for OPTIONAL and UNION. 6. Formal semantics of OPTIONAL is not clear. The current wording "if S is a pattern solution of A and of B otherwise if S is a solution to A but not to B" appears to reduce logically to just "S is a solution of A". This issue is likely to be handled at the same time as the issue on the domain of solutions to OPTIONAL, though I'd like to list it as a separate issue. 7. How does filter evaluation work if there is an unbound variable that is not within a BOUND function? 8. There is no bridge from the syntax to the semantics. In addition, reviewing http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0129.html I suggest one further editorial action, for rq24 5.4 "Basic graph patterns in SPARQL syntax", as follows: This gives an example of a basic graph pattern with a blank node identifier, and it says " with the scope of the blank node label being the basic graph pattern". What is still missing is an example of the results of such a query, and a contrasting example showing how the query behaves differently when the triple patterns are placed into separate graph patterns. That is, I would like to see a contrast between { _:x :p >v . _x :q ?w } and { { _:x :p >v } { _x :q ?w } } so the the reader can really appreciate what it means that in the latter example the blank node labels are in different scopes. Can this be handled editorially? **************************************************** Recap of all my comments: http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0114.html editorial comments on CR dated 6 april 2006 Some (hopefully) editorial comments: 2.1.1 Syntax for IRIs It would be helpful to have examples of the two abbreviated syntaxes. One might think that abbreviated syntaxes must be written in angle brackets. ++ solved in rq24 3.1.1 "Syntax for IRIs" 2.3 Triple patterns The last sentence of this section says "This definition also allows blank nodes in the predicate position." But the second component of a triple pattern is a member of I union V, where I is the set of IRIs and V is the set of variables. Thus it seems that SPARQL blank nodes are not permitted in the second position. Also the BNF for Verb says it may be VarOrIRIref or 'a', so it seems that a SPARQL blank node can not be a predicate. ++ solved; I don't see this sentence any more in rq24 4.2 "Triple patterns" 2.4 Pattern solutions The first two sentences in the box use the following terms: "variable solution", "substitution function", "pattern solution", and "variable substitution". How are these terms related? Taken literally, it says that "a variable solution is a substitution function" and a "pattern solution is a variable substitution". Thus the first and second sentence seemingly have nothing to do with one another. In that case, why is the first sentence, about "variable solution", found in a box called "Definition: pattern solution"? This is very confusing. The reader is left suspecting that in fact all four terms are interchangeable. Furthermore, scanning the rest of the document, one finds that "variable solution" is never used at all, and the term "solution" is frequently used without qualification (neither as "variable solution" nor as "pattern solution"). We should use our terminology consistently. ++ solved in rq24 4.3 "Pattern solutions", where "variable substitution" replaces the term "variable solution" that I complained about. In addition I see the new sentence that says that "solution" is short for "pattern solution". 2.5.1 General framework Definition of "basic graph pattern E-matching", first bullet talks about BGP and BGP' being "graph-equivalent". This term is not defined; instead "equivalent" is. we should use our terminology consitently. ++ not addressed yet in rq24, where the term is defined as "equivalent" and used as "graph-equivalent". There is a hot link from "graph-equivalent" which takes you to "RDF Concepts" Rec. Andy Seaborne in http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0122.html asks if that link satisfies my concern. My answer is that I think this link is incorrect because BGP and BGP' are not graphs, they are basic graph patterns, and so the link should take you to the definition of "equivalent" in this specification. It is true that the definitions of "graph-equivalent" and "equivalent" (for basic graph patterns) are very similar, but they still have different domains. Alternatively, the text can be read as saying that the term "graph-equivalent" is being extended so that its domain includes basic graph patterns. That would be fine, but in that case the definition should be reworded to call it "graph-equivalent" rather than "equivalent", and the hot link still needs to take the reader to the extended definition, not the definition whose domain is only graphs. Perhaps the editor can make these changes editorially. However, I think the expectation is that the entire section will be rewritten, so I am not proposing that this be added to the issue list. 9 Specifying RDF datasets A graph is specified using an IRI, which can be a QName, but there are no examples of using a QName to specify a graph. This would be helpful. ++ solved in rq24 9.3.3 "Restricting possible graph IRIs" 10.1 solution sequences and result forms The first formal definition says that a "solution sequence" is a "list". Both of these terms imply ordering. Then the last sentence in the first box says "The solution sequence from matching the query pattern is an unordered collection...". This is contradictory. What we probably mean is that "The solution sequence from matching the query pattern is in an implementation-dependent order". ++ solved in rq24 10.1 "Solution sequences and result forms" by moving the sentence outside the box and rewording it. 10.1 Solution sequences and result forms the last sentence in the first box says "The solution sequence from matching the query pattern is an unordered collection...". This sentence is not part of the definition of "solution sequence" so it should be placed outside the box. ++ same solution as previous item. 10.1 Solution sequence and result forms It would be helpful if the stages of processing solution sequences were always mentioned in the same order. The order is given as ORDER BY, project, DISTINCT, LIMIT, OFFSET. The text to be rearranged is: a) In the box for the definition of solution sequence modifier (order modifier should be moved ahead of projection modifier) b) The arrangement of subsections (10.1.3 ORDER BY should be moved ahead of 10.1.1 Projection) ++ solved in rq24 10.1 "solution sequences and result forms" A.5 Escape sequences in strings It says that \U may only be used for Unicode code points in the range U+10000 through U+10FFFF. So the first two HEX digits must always be 00? If so, why not show the syntax as '\U00' HEX HEX HEX HEX HEX HEX ? A.6 "excape sequences in IRI references" - same comment. ++ not done in rq24, but this was only a suggestion so I don't care. ********************************************************* http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0115.html Comments on 2.5.1 "General framework" in CR dated 6 April 2006 2.5.1 General framework This section is difficult to read. All we are defining in the current edition of SPARQL is with simple entailment, so why bother to lay out a more general framework at all? The only practical consequence from this section seems to be that there is a scoping set which is the set of RDF terms in the dataset. If we must have a general framework, perhaps we can dispense with defining entailment, contenting ourselves with referencing [RDF-MT] for a discussion of entailment in general. 2.5 Basic graph patterns The definition of E-entailment regime, taken literally, says that any subset of P(RDFG) x P(RDFG) is an entailment regime, where P(RDFG) is the power set of RDFG, the set of all RDF graphs. In contrast, reference [RDF-MT] "RDF Semantics" never actually defines entailment, but I see two consequences from the discussion there: a) Entailment is a relation from a set of graphs to a single graph, not to a set of graphs. Thus we want a subset of P(RDFG) x RDFG. It seems the text also has this in mind, since the definition of "well-formed" contemplates that the range of E (ie, the projection of E on its second component) is not a set of sets of graphs but a set of graphs. b) Entailment involves the notion of correct inferencing: given a set of graphs S, they entail graph G if all interpretations of S are interpretations of G. Thus it is not true that just any subset of P(RDFG) x RDFG constitutes an entailment. 2.5 Basic graph patterns The definition of "well-formed for simple entailment" is difficult to apply. It seems that all RDF graphs are well-formed for simple entailment. For example, every graph G is entailed by the singleton set { G }. Therefore every RDF graph is in the range of simple entailment. Therefore every RDF graph is well-formed. So what is the point to this concept? Looking ahead, it seems that the reason for introducing "well-formed" is for the third bullet in the definition of "basic graph pattern e-matching". However, the fourth bullet of that definition already implies the third bullet. That is, if G E-entails (G' union S(BGP')), then G' union S(BGP') must be in the range of the entailment E, and so must be well-formed. Hence it seems that we can dispense with the notion of well-formed, since the only use of it is superfluous. 2.5.1 General framework After the definition of scoping set, it says "The scoping set may be characterized differently by different entailment regimes". I don't know what "characterized" means here. Does it mean that the entailment implies the scoping set? So that simple entailment uses one scoping set and RDF-entailment uses another scoping set? If in fact the scoping set is not an independent parameter, then the scoping set should be mentioned in the definition of "E-entailment regime". 2.5.1 General framework Definition of scoping graph says that "*the* scoping graph ... is *an* RDF graph". This does not make sense. One cannot use "the" to refer to something that is not pinned down uniquely. There is not one graph that is uniquely graph-equivalent to G. What we mean is "A scoping graph... is an RDF graph...". After making this correction, the definition of scoping graph does not imply the sentence immediately following the definition, "The [sic, should be "a"] scoping graph makes the graph to be matched independent of the chosen blank node names". For example, G is graph-equivalent to G, trivially, therefore G is a scoping graph for itself. But if G has a problem with blank node names, then the scoping graph G also does, so we cannot conclude that the scoping graph automatically solves this problem. The best that can be said that a suitably chosen scoping graph will make the graph to be matched independent of the chosen blank node names. Looking ahead to the use of the notion, it is currently treated as a parameter in the definition of "basic graph pattern E-matching". However, I doubt that one has a specific scoping graph in mind when one does a basic graph pattern E-match. That is, treating "basic graph pattern E-match" as a boolean-valued function, you don't want the scoping graph as one of the arguments to this function. Proposed resolution: Delete the definition of "scoping graph", and in the definition of "basic graph pattern E-matching", delete the phrase "with scoping graph G'", replacing it with a new bullet reading "there exists a graph G' that is graph-equivalent to G". 2.5.1 General framework The pattern solution S is not being treated with the same respect shown to the other parameters in the definition of "basic graph pattern e-matching". It should be listed as one of the "givens" at the start of the sentence. 2.5.1 General framework In the definition of "Basic graph pattern E-matching", regarding the first bullet: a) it would be clearer if it were reworded "There exists BGP' such that BGP' is a basic graph pattern ...". b) The notion of graph-equivalent is not defined for basic graph patterns, only for graphs. Rather, the defined term is just "equivalent". All together, the first bullet should read "There exists BGP' such that BGP' is a basic graph pattern that is equivalent to BGP" 2.5.1 General framework Summary of my proposed rewrite to the definition of "Basic graph pattern E-matching": Given: - an entailment regime E, - a basic graph pattern BGP, - an RDF graph G, - a pattern solution S, and - a scoping set B then BGP E-matches with pattern solution S on graph G with respect to B if: - There exists a basic graph pattern BGP' and a graph G' such that: + BGP' is equivalent to BGP + G' is graph-equivalent to G + G' and BGP' do not share any blank node labels - G E-entails (G' union S(BGP')) - The RDF terms introduced by S all occur in B ++ I don't believe these comments have been addressed in rq24. This does not surprise or annoy me, since it seems that we are waiting for the entailment experts, especially Pat Hayes, to weigh in on a rewrite of the material on entailment and the general framework. My comments may be of interest to them; in any case, I am waiting to see that proposed rewrite. In the meantime, I suggest that the topic of rewriting the material on entailment and the general framework should be added to the issues list. ****************************************************** http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0116.html comment on CR 10.1 "solution sequences and result forms" 10.1 Solution sequences and result forms The proposed order for processing LIMIT and OFFSET seems counterintuitive. Suppose the solution sequence has solutions (S1, S2, ..., S9). Suppose the user asks for LIMIT 3 and OFFSET 4. With the current rules, LIMIT 3 will truncate the solution sequence to (S1, S2, S3) and then the offset is greater than the number of solutions so the final result is empty. Instead, in this scenario, I think the user expects to get (S5, S6, S7). Thus the offset should be applied first, reducing the solution sequence to (S5, S6, ..., S9), and then the LIMIT should be applied. From this standpoint, the BNF for SolutionModifier should be rearranged to put OffsetClause ahead of LimitClause. ++ mostly solved in rq24 10.1 "Solution sequences and result forms" The part that is not addressed is whether we should rearrange rule [14] SolutionModifier to place OffsetClause? before LimitClause? . I suggest adding this to the issues list. *********************************************************** http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0117.html comment on A.7 "Grammar" A.7 Grammar There are no commas in the SELECT list. This looks like a poor design because it will be an obstacle to allowing arbitrary expressions in the SELECT list in a future version. I asked for arbitrary expressions in a previous round of comments, and was told that this was being deferred to the future. That is fine, but I want to insure that the ground is ready for that extension. ++ I believe this comment triggered reopening an issue on punctuation. ************************************************************ http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0128.html Re: Draft response to: Re: major technical: blank nodes This is a response to Pat Hayes's email in the archive dated 26 Jan 2006 16:50:59 -0600. Thank you for your detailed comments. They have been very helpful to me personally in understanding the draft. The reason for my reply is that I believe we can do a better job in our treatment of blank nodes in SPARQL. I first came to an earlier draft after reading the RDF Recommendations. I found the SPARQL draft very confusing and frustrating. My essential complaint was that SPARQL uses one term for two concepts: a) RDF blank nodes, which are nodes in a graph with no label, and b) SPARQL blank nodes, which are lexical tokens in a SPARQL query. Pat Hayes's email rejects this interpretation. However, let me give the reasons that I held it, based on my reading of RDF and SPARQL both: a) According to the "RDF Concepts and abstract syntax" Recommendation, section 6.6 "Blank nodes", the set of RDF blank nodes is distinct from the set of IRIs and "Otherwise, this set of blank nodes is arbitrary. RDF makes no reference to any internal structure of blank nodes". That is, RDF blank nodes have no label. b) The RDF Primer section 2.3 "Structured property values and blank nodes" Figure 6 "Using a blank node" shows a blank node as having no label. It goes on to describe "blank node identifiers" of which it says "...blank node identifiers are not considered to be actual parts of the RDF graph." c) In our own working draft Section 2.5.3 "Example of basic graph pattern matching" second sentence under the first box, it says "The label information is not in the graph." d) Section 2.8.3 "Blank nodes" says "Blank nodes have labels which are scoped to the query". However, RDF blank nodes have no notion of scope (they simply exist, just as IRIs and literals exist, with no notion of scope). Scope is a lexical concept (the portion of a query text in which an identifier has a single referent). My summary is that the consistent stance of the RDF Recommendations is that blank node identifiers are an artefact of serialization. Now if a reader comes to the SPARQL draft with that model, he finds it very confusing (certainly I did). For example, section 2.4 talks about how to extend a pattern solution S to graph patterns. It says "If v is not in the domain of S, then S(v) is defined to be v." Applied to SPARQL blank nodes such as _:a, this says S(_:a) is _:a. Fine; it is still a lexical token; there has been no mention of creating a blank node corresponding to the label _:a. As a result, the mapping of a triple pattern, such as ?x :v _:a is (S(?x), :v, _:a) and there still is no RDF blank node. Consequently, the result of the mapping is not an RDF triple. Then we come to setion 2.5.1 "General framework" and the definition of "basic graph pattern E-matching". This definition posits a basic graph pattern BGP' and a scoping graph G' such that "G' and BGP' do not share any blank node labels". But how can they? BGP' is a triple pattern and might contain SPARQL blank nodes; G' is an RDF graph and as such does not contain anything that can be called a blank node label at all (though serializations of G' might). After studying Pat Hayes's email, my conclusion is that the text is using blank node identifiers as proxies or surrogates for the blank nodes themselves. To clarify our text, my proposed resolution is as follows: a) We should adopt the term "blank node identifier" for what I have been calling SPARQL blank nodes. This would harmonize with RDF Recommendations, which use this term when talking about character strings associated with blank nodes for identification purposes. For example, section 2.1.4 would be renamed "Syntax for blank node identifiers". We should scan the document for other occurrences of "blank node", and, as appropriate, change to "blank node identifier". b) We state explicitly that for each distinct blank node identifier, a distinct blank node is created for the purposes of processing the query, different from any blank node in the graphs in the query's dataset. We can also say that the reader may wish to think of the blank node identifiers as proxies or surrogates for these created blank nodes. Perhaps this might go in Section 2.1.4. c) In section 2.1.8 "Result descriptions used in this document" in the definition of RDF term, the created blank nodes should be explicitly listed as part of RDF-B. (Note that even if one believed that blank node identifiers were blank nodes all along, this did not put them in RDF-B because they were not part of any graph.) d) In section 2.4 "Pattern solutions", definition of "pattern solution", we say that the domain of S is extended to include blank node identifiers by mapping each blank node identifier to the blank node that was created for it in item b) above. e) Somewhere we make the observation that the result of applying a pattern solution S to a triple pattern is an RDf triple. Thus if BGP is a basic graph pattern, then S(BGP) is an RDF graph. f) delete the definition of "basic graph pattern equivalence" (changes proposed below make it dispensable). g) delete the definition of "scoping graph", also unneeded. h) Reword the definition of basic graph pattern matching to use the notion of graph merge found in the RDF Recommendations. The revised definition is something like this: "Given an entailment regime E, a basic graph pattern BGP, an RDF graph G and a pattern solution S whose range is a subset of B, then BGP E-matches with pattern solution S on graph G with respect to scoping set B if G E-entails the graph merge of G and S(BGP)." Actually, with the statement that the created blank nodes are distinct from all blank nodes in the dataset, a simple set union will suffice, though we may wish to stick with the RDF notion of merge for consistency with RDF. i) If we want to keep the technique of renaming blank node identifiers, we move that outside the boxed definition into explanatory text. For example, "The graph merge referred to in the preceding definition can be thought of as using blank node identifiers as proxies for the blank nodes. In that case, care must be taken to ensure that the blank node identifiers of G are different from all blank node identifiers in BGP. Let G' and BGP' be serializations of G and BGP, respectively, such that all blank node identifiers in G' are different from all blank node identifiers in BGP'. Then G' UNION BGP' is the serialization of some graph G2. S is a solution for BGP using E entailment if G E-entails G2." j) In section 2.5.2 "SPARQL basic graph pattern matching" last paragraph, we can clarify that pattern solutions are unique, not just unique up to blank node renaming. The so-called "blank node renaming" is an artefact of serialization. The last sentence is thus "the serialization of a set of all pattern solutions is unique up to blank node identifiers". We can also delete the phrase "...possibly with blank nodes renamed" earlier in the paragraph, because a pattern solution is not actually concerned with assigning blank node identifiers. ++ Pat Hayes responded in http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0139.html He disagreed with me in places, but agreed with my overall contention that it would be good to distinguish clearly between blank nodes in the query and blank nodes in the data. I see that rq24 is already headed in this direction; possibly, this is solved, or else it is just a matter of on-going editorial vigilance. No issue proposed at this time. *************************************************** http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0129.html Comments on 2.1.4 "Syntax for blank nodes" in 6 April 2006 CR 2.1.4 Syntax for blank nodes It says "Blank nodes...will take part in the pattern matching". There are no examples of how pattern matching with SPARQL blank nodes works. Section 2.5.4 "Basic graph patterns in SPARQL syntax" gives an example of the syntax only, but does not discuss the semantics of the example that it presents. Section 2.8.3 "Blank nodes" and 2.8.4 "RDF collections" show how the abbreviations are expanded into SPARQL blank nodes, but do not show how the expanded patterns behave either. ++ partially solved in rq24 5.4 "Basic graph patterns in the SPARQL syntax". This gives an example of a basic graph pattern with a blank node identifier, and it says " with the scope of the blank node label being the basic graph pattern". What is still missing is an example of the results of such a query, and a contrasting example showing how the query behaves differently when the triple patterns are placed into separate graph patterns. That is, I would like to see a contrast between { _:x :p >v . _x :q ?w } and { { _:x :p >v } { _x :q ?w } } so the the reader can really appreciate what it means that in the latter example the blank node labels are in different scopes. Can this be handled editorially? 2.1.4 Syntax for blank nodes The preceding section 2.1.3 "Syntax for variables" says "Variables in SPARQL have global scope". Section 2.8.3 "blank nodes" says "Blank nodes have labels which are scoped to the query". It is not clear to me what the difference between the "global scope" of a variable vs. being "scoped to the query" means in practice. At any rate, I think it would be good to state the scope of blank nodes (or what I prefer to call blank node identifiers) in section 2.1.4, as a parallelism with 2.1.3, and possibly harmonize the terminology. ++ solved in rq24 3.1.4 "Syntax for blank nodes" ************************************************ http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0130.html Comments on UNION matching (CR 6 Apr 2006) 6. Matching alternatives Second sentence says "If more than one of the alternatives matches, all the possible pattern solutions will be found." Does this mean that if a solution is a solution of both patterns, then the solution occurs twice in the solution sequence? There are no examples of solutions with multiple cardinality. Such examples would be helpful. ++ not resolved; I don't see any examples in rq24 that address the question of duplicates. I am asking that the question of duplicates be added to the issues list. 6.2 Union matching - formal definition The definition is unclear about whether there are any constraints on the value of a solution on a variable that appears in one pattern but not in the other. Example: what is the result of SELECT ?x ?y WHERE { FILTER (?x = ?x) } UNION { FILTER (?y = ?y) } Suppose there is only one RDF term in the graph, <http:a>. There are all together four partial functions from the set of variables in the query {?x, ?y} and the set of RDF terms, namely: S1 (?x) = <http:a>, S1(?y) = <http:a> S2 (?x) = <http:a>, S2(?y) undefined S3 (?x) undefined, S3 (?y) = <http:a> S4 (?x) undefined, S4 undefined I believe that the desired set of solutions is {S2, S3}, i.e., S1 is not a solution of this query. However, arguably, S1 is a solution of FILTER (?x = ?x), and therefore belongs in the result set according to the definition as written. My proposed fix is: let P be pattern1 UNION pattern2. Then S is a solution of P if either of the following is true: 1. S is a solution of pattern1 and S is undefined on every variable that is contained in pattern2 but not in pattern1; or 2. S is a solution of pattern2 and S is undefined on every variable that is contained in pattern1 but not in pattern2. ++ not addressed in rq24. I hope to make a comprehensive proposal on the formal semantics, so I am not agitating for this specific solution. I think the issue of the domain of solutions to union patterns should be added to the issue list. 6.2 Union matching - formal definition The definition is unclear about duplicates. If s is a solution of GP1 and S is a solution of GP2, does the solution sequence contain a copy of S for each of GP1 and GP2? I believe the answer should be that duplicates are maintained because they might be meaningful to the user; if the user wishes to eliminate duplicates, the user can specify DISTINCT. In that case, the definition proposed in a separate comment needs to be rewritten because it would eliminate duplicates. I think the best approach would be to recognize that the UNION operator is constructing a solution sequence from the solution sequences of each operand. The proposed rewording is then: Let P be pattern1 UNION pattern2. Let V1 be the set of variables that appear in pattern1 and let V2 be the set of variables that appear in pattern2. S = (S1, S2, ... Sn) be a sequence of all partial functions on V1 that are solutions of pattern1. Let T = (T1, T2, ... Tm) be a sequence of all partial functions on V2 that are solutions of pattern2. Then a solution sequence of P is any permutation of the sequence (S1, ..., Sn, T1, ..., Tm). (Note: This definition involves a trick concerning partial functions. For example, each Si is a partial function on V1, therefore it is a partial function on the set of all variables in P that happens to be undefined on the variables that belong only to V2.) ++ not addressed in rq24. I think the issue of the cardinality of solutions to union patterns should be added to the issue list. ************************************************ http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0132.html Comments on optional pattern matching (CR 6 apr 2006) 5.1 Optional pattern matching last sentence of the largest paragraph says "The whole graph pattern of an optional graph pattern must match for the optional graph pattern to add to the query solution." The term "query solution" is not defined, but it also occurs in 10.1.3 "ORDER BY" where it refers to the solution sequence. In that case, the phrase "add to the query solution" presumbably means to add another solution to the solution sequence. However, that meaning does not always make sense here, because, given a pattern P of the form Pattern1 OPTIONAL { Pattern2 }, the cardinality of the solution sequence of P can be the same as the cardinality of the solution sequence of Pattern1, if for every match to Pattern1, either Pattern2 has a unique match or no match. Instead of using the verb "add", may I suggest the verb "widen"? It is true that OPTIONAL may increase the number of results, but the primary role of OPTIONAL is to widen the results with additional variables and SPARQL blank nodes found in the second pattern. ++ solved in rq24 7.1 "Optional pattern matching", where the wording has been changed to say "affect the query solution". 5.4 Optional matching - formal definition The definition is problematic because it uses two terms, "pattern solution" and "solution". It is not clear whether these are distinct concepts or the same concept. I believe most readers will think they are the same concept. In that case, the definition does not work because it is logically equivalent to "S is a solution of optional graph pattern if S is a pattern solution of A". Proof: Let S be a pattern solution of A. Now either S is a pattern solution of B or not. If S is a pattern solution of B, then S is a pattern solution of A and B and meets the criterion. If S is not a pattern solution of B, then S is a pattern solution of A but not of B, so S meets the "otherwise" part of the criterion. Thus in either case S satisfies the criterion. If "pattern solution" and "solution" are separate concepts, then we need to be more explicit about the distinction, preferably by coming up with a two-word phrase for the latter concept. If this is our intent, it still does not rescue the definition logically. A close reading of the actual words shows that we are defining "solution" as a recursive definition built on "pattern solution". For S to be a "solution" of Opt(A,B), then the first possibility is that S is a "pattern solution" of A and B. However, the intent is that B might itself be an optional graph pattern, and this close reading of the definition leaves the notion of being a "pattern solution" of an optional graph pattern undefined, so the recursion breaks down when defining Opt (A, Opt (B, C)). In addition, the definition does not work because the definition of a pattern solution is a total function whose domain is all variables. For example, consider the query { ?x ?y ?z . OPTIONAL { ?x ?z ?w } } applied to the graph with a single triple G = { (ex:a, ex:b, ex:c) }. Here is a trial solution: S(x) = ex:a, S(y) = ex:b, S(z) = ex:c, S(w) = ex:a. Note that I have deliberately defined S as a total function on all variables in the query. Now let's try to apply the definition of optional matching to this trial solution. We see that S is a solution for ?x ?y ?z and S is not a solution for ?x ?z ?w. Thus it would seem that S qualifies as a solution because it satisfies "S is a solution to A but not to A and B". However, I don't think S should be regarded as a solution to the query. Instead, I think that when S fails as a solution to B, then S should be undefined on any variables that occur only in B. Thus S2 defined by S2(x) = ex:a, S2(y) = ex;b, S2(z) = ex:c, and S2(w) undefined should be a solution to the example. My tentative fix is: given syntax pattern1 OPTIONAL { pattern2 }, call pattern1 the mandatory pattern and pattern2 the optional pattern. Any variable that appears in the mandatory pattern is called a mandatory variable. Any variable that appears in the optional pattern and not in the mandatory pattern is called an optional variable. A pattern solution S is a partial function from the variables in the query to RDF terms such that the following hold: a) S restricted to the mandatory variables is a pattern solution of pattern1. b) One of the following two cases is true: i) S is undefined on all optional variables, or ii) S is a pattern solution of pattern2 ++ my tentative fix was shown not to handle a desired capability. I am still working on the best way to formulate this. In the mean time, I request that the formal semantics of optional graph patterns should be added to the issue list. ************************************************** http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0136.html Comments on the proper domain for solutions (CR 6 Apr 2006) 2.4 Pattern solutions The definition defines "variable solution" as a partial function and "pattern solution" as a total function. Since the heading on the box calls out "pattern solution", and only the definition of "pattern solution" is in bold, and "variable solution" never appears elsewhere in the document, the reader is led to believe that the important notion is "pattern solution", the total function. However, focusing on total functions is a mistake, as shown by the OPTIONAL and UNION syntaces, which explicitly require partial functions as solutions to patterns. ++ not resolved; rq24 4.3 "pattern solutions" still says that the domain of a pattern solution is V, the set of all variables. I am asking to add an issue on the domain of solutions, which pertains here, as well as other patterns. 2.4 Pattern solutions It says that a pattern solution is a total function on V, an infinite set. As pointed out in a separate comment, solutions in general are partial functions on V, when OPTIONAL and UNION are considered. The issue to be raised in this comment is that V is not the appropriate domain, even in the case of matching a triple pattern. Consider SELECT ?a ?b ?c WHERE { ?a ?b ?c } evaluated on a graph containing a single triple, (n:s n:v n:o). Then, according to the definition, a pattern solution is a total function mapping F:V -> {n:s, n:v, n:o} such that F(?a) = n:s, F(?b) = n:v, F(?c) = n:o, and F(v) is unconstrained for all other variables v. There are a countable infinity of these total functions. Now assemble these pattern solutions into a solution sequence and project to retain just the variables ?a, ?b and ?c. You still have an infinite sequence. Thus the result of the query appears to be an infinite sequence (whose every member is the same function on { ?a, ?b, ?c }). If one replies that the projection is to be done before assembling the solution sequence, I observe first that that is not the description in Section 10.1 "solution sequences and result forms", but more importantly, that will produce the wrong cardinality on other queries, for example, SELECT ?a ?b WHERE { ?a ?b ?c } on a dataset with two triples (n:s n:v, n:o1), (n:s, n:v, n:o2). In this example, projecting before assembling the solution sequence will result in only a single solution, whereas there should be two. Instead, I believe that the correct algorithm is to look at pattern solutions whose domain is the set of variables that appear in the specific query to be evaluated (not the infinite set V). And, as pointed out in a separate comment, the focus should be on partial functions rather than total functions. ++ not resolved. I request an issue on domain of solutions. ********************************************************* http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0135.html Comment on 11.2 "filter evaluation" (CR 6 Apr 2006) 11.2 Filter evaluation The rules do not say how to handle an unbound variable in an expression. Clearly it must be possible for the argument of BOUND to be unbound. I believe the desired semantics are that it is an error to have an unbound argument for any other function. This should be stated explicitly. Also, note that the first bullet says "SPARQL functions do not process node sequences. When interpreting the semantics of XPath functions assume that each argument is a sequence of a single node". Given that an argument might be unbound, it is not true that arguments are always a sequence of length 1; I think the truth is that the argument may be a sequence of length 0 or 1. ++ Andy pointed me to http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0012.html evidently someone else thinks that this could be specified better. I request an issue for this. *********************************************************** http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0137.html Comments on 9 "Specifying RDF datasets" (CR 6 Apr 2006) 9 Specifying RDF datasets The text could be clearer about where the default graph comes from. The first paragraph hints at this, but the description is cloudy because of the use of "may". Thus the first sentence says "A SPARQL query may specify the dataset ...". The use of "may" in this sentence evidently refers to the user, who "may" choose to use the FROM clause. Third sentence "The RDF dataset may also be specified in a SPARQL protocol request". Does this mean that it is the user's responsibility to use at least one of these two techniques? What happens if the user uses neither? Is this an error? Or perhaps this would mean the default graph is empty? Or does some implementation-defined default kick in? The precise rules for determining the default graph need to be specified. ++ I am satisfied with Andy's reply in http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0029.html 9 Specifying RDF datasets There is no statement of the formal semantics of the FROM clauses. ++ I am not satisfied with Andy's reply in http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0029.html My reasons are: 1. There is no bridge from the syntax rule [9] DatsetClause to the abstract construct called a dataset in rq24 9.3 "Querying the dataset". This is an instance of my general issue that there is no bridge from syntax to semantics. 2. Andy says that it is implementation-defined whether to merge graphs to produce the default graph if there are multiple FROM clauses. However, the last sentence of rq24 9.2.1 "Specifying the default graph" says that the default graph is the merge. If Andy is correct, then that sentence needs to change. I am agreeable to an editorial resolution of this. ************************************************************* http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0131.html Comments on 10.1 "Solution sequences and result forms" (CR 6 Apr 2006) 10.1.1 Projection The last sentence of the formal definition uses set notation for the result of projecting a solution sequence into a new solution sequence. This is not desired, because: a) sets are not ordered, but solution sequences are b) sets do not permit duplicates, but the intent is that the result of a projection might have duplicates. This can be corrected by using some notation denoting a sequence. Earlier we used (S1, ..., Sn) to denote a sequence, and that could be done here, for example, ( (project (S1, VS), ... project (Sn, VS) ). Or we can use the mathematical definition of a sequence as a function whose domain is the positive integers, in which case the sequence is represented { (i, project (Si, SV) ) | i = 1, ..., n } ++ solved in rq24 10.1.2 "Projection" 10.1.3 ORDER BY The formal definition does not support the following features: a) ordering in descending order b) ordering by multiple sort keys. ++ Andy replied in http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0027.html He noted that he had made an editorial improvement to the first sentence of rq24 10.1.1 "ORDER BY". I agree that the editorial change is an improvment, and perhaps that is all that can be done editorially, but I still feel that there is an unspecified gap between the syntax and the semantics. How does one know that one uses the increasing order with ASC and reverses that order with DESC? How does one know to do the lexicographic order when there are multiple sort keys? These are not weighty issues, but other specifications have done this, and so can we. I regard this as part of my general issue of connecting the dots from the syntax to the semantics. ********************************************************** http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0134.html Comments on 2.6 "Multiple Matches" (CR 6 Apr 2006) 2.6 Multiple matches First sentence: "The results of a query is the set of all pattern solutions that match the query pattern, giving all the ways a query can match the graph queried." But sets eliminate duplicates, and we have the DISTINCT operator as an optional syntactic choice about whether to eliminate duplicates. Instead, this should say that the result of a query is a sequence of solutions. See section 10.1 "Solution sequences and result forms". ++ solved in rq24 2.2 "Multiple matches" 2.6 Multiple matches The semantics of the empty graph pattern has not been defined. I think the following queries are instructive: a) SELECT ?a FROM graph WHERE { } b) SELECT ?a ?b FROM graph WHERE { } c) SELECT ?a ?b FROM graph WHERE { ?a foaf:verb foaf:noun } d) SELECT ?a ?b FROM graph WHERE { ?a foaf:verb foaf:noun . OPTIONAL { ?a foaf:verb2 ?b } } One's initial impulse is that query a) should result in the set of all mappings of { ?a } to the scoping set (not the set of all total mappings of V to the scoping set; see related comment). Or equivalently, the user might view the result as an enumeration of the scoping set of the graph. Then query b) would result in the set of all mappings of { ?a, ?b } to the scoping set, or, naively, the cross product of the scoping set with itself. However, I believe that c) and d) should result in a subset of the result of b). Now in the case of d) in particular, OPTIONAL is intended to allow for a result which is a partial binding, ie, one that binds ?a but does not bind ?b. If it happens that there is no binding for ?b, then the result would not be a subset of the cross product of the scoping set with itself. My conclusion is that in order to support OPTIONAL and UNION, we have to permit a result that is a partial mapping. Coming back to query b), in order for it to contain query d) as a subset, the result of b) must be all partial functions from {?a, ?b} to the scoping set. Alterantively, a naive view might imagine augmenting the scoping set with a single "missing" element, distinct from all other elements, in which case the result of b) is the cross product of the augmented scoping set. And as for a), it seems the result must be the set of all partial functions of { ?a } to the scoping set, or equivalently, an enumeration of the augmented scoping set. As a different approach to this issue, consider these two queries: a1) SELECT ?a FROM graph WHERE { BOUND (?a) } a2) SELECT ?a FROM graph WHERE { !BOUND (?a) } I believe the following things: -- the result of a) should be the union of the result of a1) and a2) -- the result of a1 should be an enumeration of the scoping set, -- the result of a2 should be a single solution, in which ?a is not bound. ++ solved in rq24 6.2 "Empty graph pattern" **************************************************** http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0133.html General problem with cardinality of results in CR 6 April 2006 The specification is imprecise about the cardinality of solutions in several places. I have heard it argued that all that matters is the set of solutions (ie, duplicates can be dropped as an implementation dependent or defined detail). The problem with that position is that the number of duplicates may be semantically meaninful to the user. This issue will become acute when we add aggregates, which has been deferred to a future release. When a user performs a count or sum, the user expects precise semantics about the number of duplicates to be counted or summed. If we fail to specify those semantics in this edition, then we will encourage differing implementations, which then will have entrenched positions on cardinality issues when we try to add the aggregates. Therefore it is important to resolve the cardinality issues at this stage. Even in advance of adding aggregates, the user may be interested in computing them on his own using the results of a SPARQL query. To assure portable and interoperable results, we need to define the number of duplicates precisely. I see at least the following issues on cardinality: a) the number of solutions to the empty pattern, SELECT ?A WHERE {}. I can see arguments for 0 solutions, 1 solution (the one that makes ?A undefined), n solutions where n is the cardinality of the scoping set (one for each possible binding of ?A) or n+1 (one for each possible binding, plus one in which ?A is not bound). ++ solved in rq24 6.2 "empty graph pattern". b) the number of solutions to a UNION, for example, SELECT ?A WHERE { { ?A ?A ?A } UNION { ?A ?A ?A } } c) the number of solutions when a triple pattern includes a blank node. For example SELECT ?A WHERE { ?A n:v _:B } on the graph G = { (n:a, n:v, "1"), (n:a, n:v, "2") }. Does this have one solution or two? Argument in favor of one solution: there is only one mapping S from the set of variables to the set of RDF terms such that S( ?A n:v _:B ) is a triple that can be merged into G to produce a new graph that is simply entailed by G. This is using the definition of basic graph pattern E-matching in section 2.5.1 "General framework". Argument in favor of two solutions: Section 2.5.2 "SPARQL basic graph pattern matching" last paragraph says that under simple entailment, pattern matching can be done by mapping both variables and SPARQL blank nodes to RDF terms, testing to see if the result of the mapping is a subgraph of G. Under this formulation, there are two solutions, one that maps _:B to "1" and the other that maps _:B to "2". This paragraph goes on to say that solutions are formed by restricting such mappings to just the set of variables. What is unclear is whether the act of restricting involves discarding duplicates. Note that the fact that there is a DISTINCT modifier shows that one can not presume that duplicates are discarded. I am posting separate, more detailed comments on specific sections with cardinality issues. ++ I think we need an issue on cardinality of UNION. As for cardinality of a basic graph pattern with a blank node identifier, my current interpretation of the document is that the general framework is normative and therefore the cardinality must be 1. ************************************************************ http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0141.html Comment on 4.1 "Group graph patterns" (CR 6 Apr 2006) 4.1 Group graph patterns The formal semantics of group graph patterns does not work in conjunction with the definiton of basic graph pattern matching (section 2.5.1 "General framework"). For example, let G be the graph <s> <v1> <o1> . <s> <v2> <o2> . Consider the pattern { ?x <v1> _:a } { ?x <v2> _:a } The pattern is a group graph pattern consisting of two triple patterns. According to the definition, we are looking for a solution S that is a solution of { ?x <v1> _:a } and a solution of { ?x <v2> _:a }. As a trial solution, consider the function S that maps ?x to <s>. I claim that S is a solution of both subpatterns. For the first pattern, according to the definition of basic graph pattern E-matching the question is whether the following graph <s> <v1> <o1> . <s> <v2> <o2> . <s> <v1> _:a . is entailed by G. The answer is yes. Similarly, s is a solution of the second pattern as well. The problem in logical terms is that "for all x there exists y such that P(x, y)" is not as strong an assertion as "there exists y such that for all x, P(x, y)". The current definition only supports the weaker assertion "for all there exist" rather than the desired assertion "there exists for all". Mathematicians generally use the term "uniform" to describe "for all there exists" situations (for example, the definition of uniform continuity). On the other hand, if one uses the alternative definition of triple matching in section 2.5.2 "SPARQL basic graph pattern matching", the uniform treatment of blank node identifiers in graph patterns is assured. This definition says that a solution is found by mapping variables and blank node identifiers to RDF terms. In that case the trial solution S must specify which node in G the blank node identifier _:a is bound to. Since there is no single choice that works, there is no solution to the pattern. ++ This comment as worded above is mistaken. It assumes that using the mapping algorithm for basic pattern matching makes the scope of a blank node identifier be the entire query. However, rereading rq24 5.2 "SPARQL basic graph pattern matching", I see that the scope is still just a basic graph pattern. Whether this is advisable from a usability standpoint is a separate question. I have not decided whether to raise that as an issue. *************************************************** http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0189.html Blank node identifiers in FILTER clauses The scope of blank node identifiers is not clearly specified. However, as I have understood conversations in email and telecon, the definition of basic graph pattern E-matching in 2.5.1 "General framework" provides the only definition for the semantics of blank node identifiers, and therefore the scope of a blank node identifier is a basic graph pattern. My question is whether the scope can also extend into a Constraint in a FilteredBasicGraphPattern. For example, consider the data set with these three triples: <s1> <v> <o1> . <s2> <v> <o2a> . <s2> <v> <o2b> . The user wants to find those subjects which are related via the verb <v> to at least two objects. The desired solution sequence is { <s2> }. The user writes his query this way: SELECT ?x WHERE { ?x <v> _:a . ?x <v> _:b . FILTER (_:a != _:b) } Does this do what the user wants? It seems that the definitions in 2.5 "Basic graph patterns" only explain how to solve the basic graph pattern ?x <v> _:a . ?x <v> _:b . The solutions of this basic graph pattern are ?x = <s1> and ?x = <s2>. In the case of ?x = <s1>, this is because the dataset entails the addition of these triples: <s1> <v> _:a . <s1> <v> _:b . or in predicate calculus terms, it is possible to conclude from the dataset that (exists _:a, _:b) [ <s1> <v> _:a . <s1> <v> _b . ] Or using the mapping technique for simple entailment, map ?x -> <s1>, _:a -> <o1>, _:b -> <o1> and then restrict to just the mapping of ?x. Note that the definitions of section 2.5, using either entailment or mapping, do not provide for evaluating a Constraint during the process of finding solutions to a basic graph pattern. So both solutions ?x -> <s1> and ?x -> <s2> come to the FILTER clause, and the FILTER clause is unaware of any bindings to _:a or _:b. I do not know whether the result of FILTER (_:a != _:b) is true, false or error, but whatever the semantics of the FILTER clause is, it appears that it will treat the two solutions identically. If true, then both <s1> and <s2> are solutions; if false or error, then neither are. Thus the solution set appears to be either { <s1>, <s2> } or the empty set. Not what was desired! I see four possible resolutions: 1. (My preference) the scope of a blank node identifier is an entire FilteredBasicGraphPattern, not just a basic graph pattern. To do this, we need to extend the definitions in section 2.5 so that they define the solutions of a FilteredBasicGraphPattern rather than just the solutions of a basic graph pattern. I can see how to do this with the simple entailment mapping definition; I don't see how to do this with the general E-entailment definition. 2. We prohibit blank node identifiers in FILTER clauses as inherently meaningless or deceptive syntax. 3. We allow blank node identifiers in FILTER clauses, but they always raise an error, so that such FILTERs always fail. But in that case, why did we permit the syntax? 4. We allow blank node identifiers in FILTER clauses, and they reference distinct blank nodes, distinct from all blank nodes in the dataset. Thus _:a = _:b is false, and _:a != _:b is true. ++ this comment started an extended dialog. My summary is that it seems there is a consensus for choice 1. I suggest that we add this to the issue list. Presumably it will be resolved by the anticipated rewrite of the general framework. ***************************************************** http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0008.html Concrete vs. existential semantics I have been advocating for strict definitions of the number of rows returned by queries. As I understand it, Andy Seaborne has advocated an opposite view, that SPARQL should not define precisely how many duplicates are returned by a query. For example, in http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0005.html "In general, it isn't possible to conclude anything about numbers of things in RDF. It is in OWL." I have also heard the opinion that it does not matter whether duplicates are eliminated from a UNION or not; I don't have a name or message to cite for that opinion. More generally, I think there is an opinion that all SPARQL cares about is that the result sequence, after eliminating duplicates, is correct. Thus the result of a SELECT is not precisely defined; only SELECT DISTINCT is. In this message I want to start a discussion on this. As an initial foray, I will frame the question in terms of "concrete" vs. "existential" semantics. I grant that it is difficult to impossible to be sure that two seemingly-different IRIs refer to distinct things. I also grant that it is difficult to impossible to be sure that two seemingly distinct blank nodes, conceived of as existentials, are known to be distinct. However, I wonder whether it is a good idea to base our semantics exclusively on these "existential" insights. I think the naive view is that two things are distinct if they look distinct. Two IRIs that are spelled differently are different. Two blank nodes with different node identity are different. (Blank node identifiers are proxies for node identity; two blank nodes with different identifiers are different). I think that in many instances, the users will want this kind of concrete interpretation of an RDF graph. Further, I believe that when one is working with a concrete interpretation, duplicates may carry semantic meaning and it is important to define precisely how many duplicates are returned. I especially believe this is true when there are financial figures involved. For example, imagine a purchase order encoded in RDF. Each purchase order has an IRI. Various facts about the PO are assembled using verbs: bill-to, ship-to, and the line items. Since bill-to, ship-to and line items are all compound objects, they may be represented by blank nodes, which in turn connect via various verbs to literals or IRIs. Let's look at the line items in particular. A line item consists of a part number (an IRI), a quantity (an xsd:integer), and a unit price (an xsd:decimal). The user wants to find the total price of a particular PO. The query looks something like this: SELECT ?quantity ?price WHERE some:IRI po:po _:lineitem . _:lineitem po:quantity ?quantity . _:lineitem po:price ?price . Since SPARQL has no aggregates or expressions in its SELECT list, the user intends to simply fetch all rows, multiply ?quantity * ?price and take the sum himself. Now it can happen in a PO that the quantity and price of two line items are identical. However, suppressing such duplicates would be fatal to this application. Note that adding the part number to the SELECT list will not necessarily save the query, since the combination of part number, quantity and price is still not a guaranteed unique key for line items. The user is relying on distinct blank nodes to represent distinct line items. Of course, from the point of view of "RDF Semantics" that would be a redundant graph, for example, one that asserts "There exists a line item whose part is XYZ, quantity is 1 and price is 10.99" and asserts again "There exists a line item whose part is XYZ, quantity is 1 and price is 10.99". Thus one could say that this is a misuse of RDF. This may be technically true, but I wonder if insisting on this point will really serve the users. If you read the RDF Primer, the application design above makes sense. You have a line item; you don't want to bother creating an IRI for each line item; so you make a blank node for each line item. "RDF Semantics", on the other hand, is a dense document with talk about hypothetical universes that are interpretations of a graph. This is not the kind of material that will make its way into seminars, courses, how-to books, etc. The early days of relational databases encountered the same problem. The theorists said a relational table is a set, therefore it can have no duplicates, therefore it is up to the user to insert some additional piece of information to distinguish two otherwise-identical line items, to provide a unique key. Sounds great in theory; however, the vendors discovered that they had to accomodate the naive view that each row has its own identity and is distinct, without requiring a unique key. A slightly different response is that RDF and SPARQL are not targeted at such applications. However, the introduction to "OWL web ontology language guide" poses this scenario: "consider actually assigning a software agent the task of making a coherent set of travel arrangements." If eventually RDF databases and SPARQL queries are part of such a software agent, then it will be necessary to make concrete assurances about the total price of a travel plan. In addition, the vision is that the dataset will be aggregated from many sites, which means that there will not be a central authority to impose strict existential semantics. My suggestion is that we consider some syntactic way to support both a "concrete" interpretation and an "existential" interpretation. My tentative initial solution is a three-way switch: SELECT DISTINCT, SELECT ALL and SELECT LAX. SELECT DISTINCT promises to remove duplicates, SELECT ALL promises to deliver all duplicates, and SELECT LAX makes no promises either way. (Anyone have a better keyword for this choice?) I don't believe this is the complete solution to the issue. The reason is that the issue of duplicates becomes more complicated when using OWL entailment. OWL permits the deduction that two seemingly distinct IRIs or blank nodes are in fact equal. For example, if the reasoner can deduce that some:IRI1 = some:IRI2, what should the reasoner return for SELECT ALL? Does it return both even though it knows they are equal? If not, how does the user frame a query to ask for all synonyms of some:IRI1? What should the reasoner return for SELECT DISTINCT? Does it pick one of the two arbitrarily? ++ my comment was vigorously rebutted and I did not receive any messages in support of my comment. I understood the replies and I can see the opposing point of view. I continue to be concerned that we may lose the confidence of our users if SPARQL cannot be relied on to return results that can be meaningfully counted and summed. However I am not pushing to add this to the issue list at this time. ******************************************************* http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0042.html ready to exit CR? I am strongly opposed to exiting CR because of the issues I have raised with the specification, which I regard as serious and fatal. In my view the purpose of a specification is to specify. Examples do not constitute a normative specification. The document (both the CR and rq24) fails to specify in the following important ways: 1. There is no bridge from the concrete syntax to the abstract semantics. Consequently the document can not actually be said to specify the language at all, except that A.7 "Grammar" really does specify the syntax. 2. The scope of blank node identifiers has not been stated clearly. The consensus in an email thread appears to be that the scope is a FilteredBasicGraphPattern (rule [21]) but the definitions in 2.5.1 "General framework" do not support this and need to be rewritten. 3. The abstract semantics does not pay attention to the critical issue of the domain of solutions. Consequently the notion of "solution" is not well-defined. 4. The preceding problems are perhaps at their worst in the case of optional graph patterns. The grammar does not indicate what the first operand of a graph pattern is, and there is no discursive text on the subject either. Thus there is no bridge from the syntax to the abstract semantics. As for the abstract semantics, the definition of OPT(A,B) appears to reduce to just solving A with no role for B. 5. It is not clear whether UNION requires an implementation to count duplicate solutions precisely, which I personally advocate, though I could live with the alternative of stating explicitly that it is implementation-defined or -dependent how many duplicates are returned. ++ this message recapitulates some of my comments and indicates some that I feel are obstacles to progression. These issues are all enumerated earlier in this file and no additional issues need to be created. Fred
Received on Wednesday, 2 August 2006 19:25:55 UTC