- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Wed, 6 Sep 2006 16:15:25 +0100
- To: Bijan Parsia <bparsia@cs.man.ac.uk>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Jorge is picking on me :) (Not really, but he pointed out some stuff that is obvious, but not have been salient enough to most group members so I thought I'd pull them out.) REALLYREALLYDISTINCT (i.e., answer set leanness) is complex, no question. It is, importantly, difficult in the size of the results. (Now, i believe we'll hit that seldomly in real cases, but I don't have any empirical data organized for that yet). And the size of results is not independent of the size of the data. To paraphrase Jorge, the bigger the data, then the bigger the results for a fixed query. However, lean graph distinctness requires leaning the graph. Either you will *store* your graph lean, or you will lean it on demand. If you store it lean, then normal and DISTINCT won't give you any change in redundancy other than eliminating term redundancy (which can be handled pairwise). If you lean it on demand, then you will experience a lot of pain on the first query (at least). You could be clever and maintain both graphs all the time, I suppose, or better, mark triples as part of the leaned graph or not (that's probably the best). Of course, since projection can introduce significant redundancy, it's possible that you'll have to lean answers from lean graphs when doing REALLY REALLY DISTINCT. If you give distinct URIs for leaned vs. non leaned versions of a graph, then you can emulate source leanness by directing the same query to each graph in normal mode. Since REALLYREALLYDISTINCT is only sensitive to the answer set, it can be done on the client with an analysis function. I'd also like to know about DISTINCT in combination with CONSTRUCT. My intuition says that the graph should be lean, but we can define it in other ways. Cheers, Bijan.
Received on Wednesday, 6 September 2006 15:15:45 UTC