W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2006

Re: [TO ADVANCE] formsOfDistinct

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Wed, 6 Sep 2006 16:15:25 +0100
Message-Id: <1D4ECEF0-55FF-4510-B126-CD3420B609DC@cs.man.ac.uk>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
To: Bijan Parsia <bparsia@cs.man.ac.uk>

Jorge is picking on me :)

(Not really, but he pointed out some stuff that is obvious, but not  
have been salient enough to most group members so I thought I'd pull  
them out.)

REALLYREALLYDISTINCT (i.e., answer set leanness) is complex, no  
question. It is, importantly, difficult in the size of the results.  
(Now, i believe we'll hit that seldomly in real cases, but I don't  
have any empirical data organized for that yet). And the size of  
results is not independent of the size of the data. To paraphrase  
Jorge, the bigger the data, then the bigger the results for a fixed  
query.

However, lean graph distinctness requires leaning the graph. Either  
you will *store* your graph lean, or you will lean it on demand. If  
you store it lean, then normal and DISTINCT won't give you any change  
in redundancy other than eliminating term redundancy (which can be  
handled pairwise). If you lean it on demand, then you will experience  
a lot of pain on the first query (at least). You could be clever and  
maintain both graphs all the time, I suppose, or better, mark triples  
as part of the leaned graph or not (that's probably the best).

Of course, since projection can introduce significant redundancy,  
it's possible that you'll have to lean answers from lean graphs when  
doing REALLY REALLY DISTINCT.

If you give distinct URIs for leaned vs. non leaned versions of a  
graph, then you can emulate source leanness by directing the same  
query to each graph in normal mode. Since REALLYREALLYDISTINCT is  
only sensitive to the answer set, it can be done on the client with  
an analysis function.

I'd also like to know about DISTINCT in combination with CONSTRUCT.  
My intuition says that the graph should be lean, but we can define it  
in other ways.

Cheers,
Bijan.
Received on Wednesday, 6 September 2006 15:15:45 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:27 GMT