Re: [Update] [LLD] Dataset Description from Andy Seaborne on 2014-03-03 (public-semweb-lifesci@w3.org from March 2014)

From: Andy Seaborne <andy@apache.org>
Date: Mon, 03 Mar 2014 20:01:29 +0000
To: David Booth <david@dbooth.org>, "w3.hcls@gmail.com" <w3.hcls@gmail.com>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
Message-ID: <5314DF99.3040707@apache.org>

(please forward if the mailing list does not allow non-subscribers to 
send to it)

On 03/03/14 16:32, David Booth wrote:
> On 02/09/2014 05:45 PM, w3.hcls@gmail.com wrote:
>> Relevant docs:
>> - Working draft of W3C Note:
>> https://docs.google.com/document/d/1zGQJ9bO_dSc8taINTNHdnjYEzUyYkbjglrcuUPuoITw/edit#heading=h.wyc73yp7c8jz
>>
>
> I notice that section 6.6.1 Core statistics shows this SPARQL query for
> counting the number of triples:
>
>    SELECT (COUNT(*) AS ?no) { ?s ?p ?o  }
>
> However, I believe the SPARQL 1.1 standard allows duplicate triples and
> duplicate query solutions by default.  If so, to get an accurate count
> of the number of triples, the DISTINCT keyword must be used:
>
>    SELECT (COUNT(DISTINCT *) AS ?no) { ?s ?p ?o  }
>
> I'm copying Andy Seaborne to see if this is correct, since I could not
> easily find this information in the SPARQL 1.1 spec when I did a quick
> scan.   Andy, am I correct about this?
>
> Thanks,
> David

Hi,

In the case of { ?s ?p ?o }, the match is against the default graph and 
an RDF graph is a set of triples - so there are no duplicates over the 
?s, ?p, ?o elements of a row.

Because of the nature of the pattern, COUNT(*) and COUNT(DISTINCT *) 
should be the same.



One suggestion looking at:

SELECT (COUNT(DISTINCT ?g ) AS ?no) { GRAPH ?g { ?s ?p ?o}}

which can be written as:

SELECT (COUNT(?g) AS ?no) { GRAPH ?g { } }

because "GRAPH ?g { }" results in all the graph names, one per row, and 
the graph names are distinct so there is no need for DISTINCT in the COUNT.

	Andy

Received on Monday, 3 March 2014 20:10:10 UTC