- From: Fred Zemke <fred.zemke@oracle.com>
- Date: Thu, 08 Jun 2006 23:57:40 +0000
- To: public-rdf-dawg@w3.org
The specification is imprecise about the cardinality of solutions in several places. I have heard it argued that all that matters is the set of solutions (ie, duplicates can be dropped as an implementation dependent or defined detail). The problem with that position is that the number of duplicates may be semantically meaninful to the user. This issue will become acute when we add aggregates, which has been deferred to a future release. When a user performs a count or sum, the user expects precise semantics about the number of duplicates to be counted or summed. If we fail to specify those semantics in this edition, then we will encourage differing implementations, which then will have entrenched positions on cardinality issues when we try to add the aggregates. Therefore it is important to resolve the cardinality issues at this stage. Even in advance of adding aggregates, the user may be interested in computing them on his own using the results of a SPARQL query. To assure portable and interoperable results, we need to define the number of duplicates precisely. I see at least the following issues on cardinality: a) the number of solutions to the empty pattern, SELECT ?A WHERE {}. I can see arguments for 0 solutions, 1 solution (the one that makes ?A undefined), n solutions where n is the cardinality of the scoping set (one for each possible binding of ?A) or n+1 (one for each possible binding, plus one in which ?A is not bound). b) the number of solutions to a UNION, for example, SELECT ?A WHERE { { ?A ?A ?A } UNION { ?A ?A ?A } } c) the number of solutions when a triple pattern includes a blank node. For example SELECT ?A WHERE { ?A n:v _:B } on the graph G = { (n:a, n:v, "1"), (n:a, n:v, "2") }. Does this have one solution or two? Argument in favor of one solution: there is only one mapping S from the set of variables to the set of RDF terms such that S( ?A n:v _:B ) is a triple that can be merged into G to produce a new graph that is simply entailed by G. This is using the definition of basic graph pattern E-matching in section 2.5.1 "General framework". Argument in favor of two solutions: Section 2.5.2 "SPARQL basic graph pattern matching" last paragraph says that under simple entailment, pattern matching can be done by mapping both variables and SPARQL blank nodes to RDF terms, testing to see if the result of the mapping is a subgraph of G. Under this formulation, there are two solutions, one that maps _:B to "1" and the other that maps _:B to "2". This paragraph goes on to say that solutions are formed by restricting such mappings to just the set of variables. What is unclear is whether the act of restricting involves discarding duplicates. Note that the fact that there is a DISTINCT modifier shows that one can not presume that duplicates are discarded. I am posting separate, more detailed comments on specific sections with cardinality issues. Fred
Received on Friday, 9 June 2006 05:40:17 UTC