Re: Counting, Ordering and DISTINCT from Jeen Broekstra on 2004-10-20 (public-rdf-dawg-comments@w3.org from October 2004)

From: Jeen Broekstra <jeen@aduna.biz>
Date: Wed, 20 Oct 2004 14:46:09 +0200
To: Andrew Newman <andrew@tucanatech.com>
Cc: public-rdf-dawg-comments@w3.org
Message-ID: <41765E11.7070209@aduna.biz>

Andrew Newman wrote:

[snip]

> The other issue with the SPARQL is the lack of an implicit 
> distinct.  In my understand of SQL, DISTINCT is optional because if
>  your queries work on normalized data and joins are based on 
> distinct keys then the returned results cannot be duplicated.  If 
> your query works on rows with repeated values on the same column 
> then you apply DISTINCT.
> 
> In RDF's data model there isn't really this problem of duplicated 
> data and normalization.  SPARQL has the idea of matching statements
>  in the graph.  From my understanding, RDF's data model doesn't 
> support the idea of multiple subject, predicates and/or objects 
> with the same values.
> 
> In other words, it only seems valid that if a query matches one 
> result in the graph it should return that one unique result not 
> repeated multiple results.
> 
> While I can see many use cases for distinct vs non-distinct results
>  I am not aware of a reason to return non-distinct results over 
> distinct results.  Have I missed something?

I can not answer for the DAWG of course, but a possible reason (and
indeed the reason that we have made the same choice in Sesame's SeRQL
language), is that processing of a query result to filter out
duplicates is potentially expensive. If for the purposes of the
querying client it is not a problem that duplicates are present in the
query (and this is quite often the case, in our experience, especially
in CONSTRUCT queries), then why filter them out at all?

Jeen
-- 
Jeen Broekstra          Aduna BV
Knowledge Engineer      Julianaplein 14b, 3817 CS Amersfoort
http://aduna.biz        The Netherlands
tel. +31(0)33 46599877  fax. +31(0)33 46599877

Received on Wednesday, 20 October 2004 12:45:57 UTC