Re: [TO ADVANCE] formsOfDistinct

Jorge is picking on me :)

(Not really, but he pointed out some stuff that is obvious, but not  
have been salient enough to most group members so I thought I'd pull  
them out.)

REALLYREALLYDISTINCT (i.e., answer set leanness) is complex, no  
question. It is, importantly, difficult in the size of the results.  
(Now, i believe we'll hit that seldomly in real cases, but I don't  
have any empirical data organized for that yet). And the size of  
results is not independent of the size of the data. To paraphrase  
Jorge, the bigger the data, then the bigger the results for a fixed  
query.

However, lean graph distinctness requires leaning the graph. Either  
you will *store* your graph lean, or you will lean it on demand. If  
you store it lean, then normal and DISTINCT won't give you any change  
in redundancy other than eliminating term redundancy (which can be  
handled pairwise). If you lean it on demand, then you will experience  
a lot of pain on the first query (at least). You could be clever and  
maintain both graphs all the time, I suppose, or better, mark triples  
as part of the leaned graph or not (that's probably the best).

Of course, since projection can introduce significant redundancy,  
it's possible that you'll have to lean answers from lean graphs when  
doing REALLY REALLY DISTINCT.

If you give distinct URIs for leaned vs. non leaned versions of a  
graph, then you can emulate source leanness by directing the same  
query to each graph in normal mode. Since REALLYREALLYDISTINCT is  
only sensitive to the answer set, it can be done on the client with  
an analysis function.

I'd also like to know about DISTINCT in combination with CONSTRUCT.  
My intuition says that the graph should be lean, but we can define it  
in other ways.

Cheers,
Bijan.

Received on Wednesday, 6 September 2006 15:15:45 UTC