Re: DISTINCT (was: Re: Queries over multiple graphs) from Steve Harris on 2004-09-29 (public-rdf-dawg@w3.org from July to September 2004)

From: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Date: Wed, 29 Sep 2004 13:48:47 +0100
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <20040929124847.GC31285@login.ecs.soton.ac.uk>

On Wed, Sep 29, 2004 at 01:05:29PM +0100, Andy Seaborne wrote:
> > On Tue, Sep 28, 2004 at 06:19:29 +0100, Andy Seaborne wrote:
> > > I prefer to have explicit DISTINCT.  I don?t see having SELECT
> > > returning duplicate rows contradicting RDF's set of statements if
> the
> > > app writer only wants some of the variables. 
> > > 
> > > If there is no DISTINCT, then there is there is one result for every
> > > way the query can be matched.  Because SELECT can remove variables,
> > > it is possible the application can't tell two solutions (table rows,
> > > results) apart - but it can if there is "SELECT *" or SELECT with
> all
> > > the variables. "SELECT DISTINCT" means no two results the same even
> > > when there fewer variables.  Hence "SELECT DISTINCT *" is a no-op.
> > 
> > Doesnt that assume that every statement in the system is unique at the
> > triple level? That is not neccesarily the case.
> 
> In the sense that an RDF graph is a set of statements, every statement
> is unique. When querying "SELECT *" there will be one unique solution
> for each way the query can match.  Hence each row is different in some
> way.

Agreed, but the differences may not be apparent at the (s,p,o) triple
level.
 
> If I understand 3Store correctly, it is as much a collection of graphs
> to query - and does not present a concept of the RDF model of the whole
> collection.

That is correct, if you disallow duplicate triples.

>              It's more like having an implicit "SOURCE *" around each
> query pattern.

Possibly, depending on what semantics we agree for a store containing
a set of graphs. I would prefer DAWG to not require a particular
behaviour. In my RDQL implememtation its irrelevant as the inplicit
DISTINCT makes them appear the same. IIUC you would like to require that
queries that dont use SOURCE will treat thier entire contents as a single
RDF grpah with unique statements. That seems overly prescriptive to me.
What's the motivation?
 
> > > ?x = ...    // and no ?y
> > > ?y = ...    // and no ?x
> > 
> > What about ?x=NULL, ?y=NULL and ?x=... , ?y=..., would those also be
> > valid 
> > solutions? I think I'm not following this part. Possibly a more
> concrete
> > example would help.
> 
> Yes - an example would help - and I think I got the example wrong.
> OPTIOANL is "greedy" in that if it can bind it does.  No unbound is
> generated if an OPTIONAL can match (A [B] is A+B if B matches else A).
> I'll try to do an example in a sparate mail thread.

Right, OK, I think that what was confusing me.

- Steve

Received on Wednesday, 29 September 2004 12:48:54 UTC