W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > March 2014

Re: [Update] [LLD] Dataset Description

From: David Booth <david@dbooth.org>
Date: Mon, 03 Mar 2014 17:00:46 -0500
Message-ID: <5314FB8E.2070300@dbooth.org>
To: Andy Seaborne <andy@apache.org>, "w3.hcls@gmail.com" <w3.hcls@gmail.com>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
Hi Andy,

On 03/03/2014 03:01 PM, Andy Seaborne wrote:
> (please forward if the mailing list does not allow non-subscribers to
> send to it)
>
> On 03/03/14 16:32, David Booth wrote:
>> On 02/09/2014 05:45 PM, w3.hcls@gmail.com wrote:
>>> Relevant docs:
>>> - Working draft of W3C Note:
>>> https://docs.google.com/document/d/1zGQJ9bO_dSc8taINTNHdnjYEzUyYkbjglrcuUPuoITw/edit#heading=h.wyc73yp7c8jz
>>>
>>>
>>
>> I notice that section 6.6.1 Core statistics shows this SPARQL query for
>> counting the number of triples:
>>
>>    SELECT (COUNT(*) AS ?no) { ?s ?p ?o  }
>>
>> However, I believe the SPARQL 1.1 standard allows duplicate triples and
>> duplicate query solutions by default.  If so, to get an accurate count
>> of the number of triples, the DISTINCT keyword must be used:
>>
>>    SELECT (COUNT(DISTINCT *) AS ?no) { ?s ?p ?o  }
>>
>> I'm copying Andy Seaborne to see if this is correct, since I could not
>> easily find this information in the SPARQL 1.1 spec when I did a quick
>> scan.   Andy, am I correct about this?
>>
>> Thanks,
>> David
>
> Hi,
>
> In the case of { ?s ?p ?o }, the match is against the default graph and
> an RDF graph is a set of triples - so there are no duplicates over the
> ?s, ?p, ?o elements of a row.
>
> Because of the nature of the pattern, COUNT(*) and COUNT(DISTINCT *)
> should be the same.

I'm particularly thinking of AllegroGraph, which (by default I believe) 
does not remove duplicate triples if the same triple happens to be 
loaded more than once.  If AllegroGraph returns a different count to the 
queries above (with or without DISTINCT), does that mean that 
AllegroGraph is not SPARQL 1.1 compliant?   I.e., is it a bug, or is it 
a permissible implementation variation?

I had the impression that SPARQL 1.1 conformant implementations are 
permitted to have duplicate solutions in the solution set unless the 
word DISTINCT is used, and hence I would have thought that a solution 
set that is not explicitly constrained to be DISTINCT could include 
duplicates, even if that solution set is for only a { ?s ?p ?o } graph 
pattern over the default graph, but maybe I'm wrong.  OTOH, if, when 
DISTINCT is not specified, the SPARQL 1.1 standard only *sometimes* 
permits duplicates, then how can I determine which circumstances permit 
them and which don't?

David
Received on Monday, 3 March 2014 22:01:15 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:53:08 UTC