DISTINCT (was: Re: Queries over multiple graphs) from Seaborne, Andy on 2004-09-28 (public-rdf-dawg@w3.org from July to September 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 28 Sep 2004 18:19:29 +0100
To: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <41599D21.9070906@hp.com>

Steve Harris wrote:
> On Tue, Sep 28, 2004 at 09:40:35AM -0500, Pat Hayes wrote:
> 
>>>On Tue, Sep 28, 2004 at 11:47:05AM +0100, Steve Harris wrote:
>>>
>>>>In recent discussions over email and on IRC this morning we hit a
>>>>difference of implementation in some RDF storage systems.
>>>>
>>>>In my system if you load multiple graphs with distinct source information
>>>>and the same statement appears in 2 graphs there are two distinct
>>>>statements. The same goes for statements that may be inferred by multiple
>>>>routes.
>>>
>>>Discussing this further it appears to be a philosophical difference: does
>>>an RDF store hold a single (merged) RDF graph with statements from
>>>different sources, or does it hold a set of RDF graphs.
>>
>>Surely the right answer is, yes. That is, it could be either. Its up 
>>to the owner responsible for maintaining the store to decide. So the 
>>QL ought to be able to handle both cases, which I think is the case 
>>right now.
> 
> 
> I agree, if so we have to agree on wether results are implicitly DISTINT
> or if (A, A) == (A) in query results.
> 
> - Steve
>  
> 

I prefer to have explicit DISTINCT.  I don’t see having SELECT returning 
duplicate rows contradicting RDF's set of statements if the app writer only 
wants some of the variables.

If there is no DISTINCT, then there is there is one result for every way the 
query can be matched.  Because SELECT can remove variables, it is possible the 
application can't tell two solutions (table rows, results) apart - but it can if 
there is "SELECT *" or SELECT with all the variables.  "SELECT DISTINCT" means 
no two results the same even when there fewer variables.  Hence "SELECT DISTINCT 
*" is a no-op.

- - - - - - - -

Follwoign on, for optionals, thsi approach suggests a style of one query result 
row for each way a query can match.

E.g.: Separate optional blocks, separate variables:

OPTIONAL ... ?x ....    // does not use ?y
OPTIONAL ... ?y ....    // does not use ?x

gives

?x = ...    // and no ?y
?y = ...    // and no ?x

Not a single result like:

?x = ... , ?y = ...

otherwise I think you can write queries where, with all the variables and nested 
optionals, you can't tell how the query matched because it can appear by two routes.

(These comments are invalidated by disjunction)

	Andy

Received on Tuesday, 28 September 2004 17:19:56 UTC