- From: Andy Seaborne <andy@apache.org>
- Date: Mon, 03 Mar 2014 22:43:31 +0000
- To: David Booth <david@dbooth.org>, "w3.hcls@gmail.com" <w3.hcls@gmail.com>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
On 03/03/14 22:00, David Booth wrote: > Hi Andy, > > On 03/03/2014 03:01 PM, Andy Seaborne wrote: >> (please forward if the mailing list does not allow non-subscribers to >> send to it) >> >> On 03/03/14 16:32, David Booth wrote: >>> On 02/09/2014 05:45 PM, w3.hcls@gmail.com wrote: >>>> Relevant docs: >>>> - Working draft of W3C Note: >>>> https://docs.google.com/document/d/1zGQJ9bO_dSc8taINTNHdnjYEzUyYkbjglrcuUPuoITw/edit#heading=h.wyc73yp7c8jz >>>> >>>> >>>> >>> >>> I notice that section 6.6.1 Core statistics shows this SPARQL query for >>> counting the number of triples: >>> >>> SELECT (COUNT(*) AS ?no) { ?s ?p ?o } >>> >>> However, I believe the SPARQL 1.1 standard allows duplicate triples and >>> duplicate query solutions by default. If so, to get an accurate count >>> of the number of triples, the DISTINCT keyword must be used: >>> >>> SELECT (COUNT(DISTINCT *) AS ?no) { ?s ?p ?o } >>> >>> I'm copying Andy Seaborne to see if this is correct, since I could not >>> easily find this information in the SPARQL 1.1 spec when I did a quick >>> scan. Andy, am I correct about this? >>> >>> Thanks, >>> David >> >> Hi, >> >> In the case of { ?s ?p ?o }, the match is against the default graph and >> an RDF graph is a set of triples - so there are no duplicates over the >> ?s, ?p, ?o elements of a row. >> >> Because of the nature of the pattern, COUNT(*) and COUNT(DISTINCT *) >> should be the same. I think section 6.6.1 Core statistics is correct as is. What does the spec say? That's the definitive place to look. > > I'm particularly thinking of AllegroGraph, which (by default I believe) I don't know what AllegroGraph does. Sounds like a question for the developers. > does not remove duplicate triples if the same triple happens to be > loaded more than once. bNodes? All the RDF syntaxes, when a fie is read twice, creates separate bNodes. > If AllegroGraph returns a different count to the > queries above (with or without DISTINCT), does that mean that > AllegroGraph is not SPARQL 1.1 compliant? I.e., is it a bug, or is it > a permissible implementation variation? > > I had the impression that SPARQL 1.1 conformant implementations are > permitted to have duplicate solutions in the solution set unless the > word DISTINCT is used, do you have a pointer to text that gave you that impression? > and hence I would have thought that a solution > set that is not explicitly constrained to be DISTINCT could include > duplicates, even if that solution set is for only a { ?s ?p ?o } graph > pattern over the default graph, but maybe I'm wrong. I don't see how { ?s ?p ?o } can create duplicates - an RDF graph is a *set* of triples (that's not a SPARQL definition - it's an RDF definition) so subject/predicate/object is a unique combination within a graph. If the graph is composed behind the scenes of other data, that's nothing to do with the RDF or SPARQL specs. > OTOH, if, when > DISTINCT is not specified, the SPARQL 1.1 standard only *sometimes* > permits duplicates, then how can I determine which circumstances permit > them and which don't? It depends on the query pattern but we're talking about one specific pattern - { ?s ?p ?o } In general, SPARQL results are multisets (duplicates). Some of the algebra operations can cause duplicates such as projection and union but their cardinality is defined. Andy > > David >
Received on Monday, 3 March 2014 22:44:01 UTC