- From: David Booth <david@dbooth.org>
- Date: Wed, 14 Sep 2011 17:22:50 -0400
- To: public-rdf-dawg-comments <public-rdf-dawg-comments@w3.org>
Sec 15 Solution Sequences and Modifiers http://www.w3.org/TR/sparql11-query/#solutionModifiers says that the ORDER modifier is applied before the DISTINCT modifier. I suggest that this be changed to say that ORDER is applied after REDUCED. This would be a very simple change to the spec -- moving one line in sec 15. AFAICT it would not affect the semantics, but I think it would help implementers. The reason I suggest this is that an implementation performing the ORDER before DISTINCT must generate *all* (non-distinct) solutions in order to sort them, in spite of the fact that the ordering of the solutions can never affect the elimination of those that are non-distinct. AFAICT it would be much more efficient to perform the ORDER *after* performing DISTINCT (or REDUCED). For example, there may be 100 million solutions but only 20 distinct solutions. If SORT is done first, then all 100 million solutions must be held in memory at once and sorted, before later reducing the result set to 20. But if DISTINCT is done first, then duplicate solutions can be discarded *as* they are generated, so only 20 need to be held in memory. Granted, since an implementation is free to optimize things however it wishes anyway -- provided that it still produces the correct results -- it could perform the ORDER modifier after the DISTINCT and REDUCED modifiers even if the spec says to do them the other way around. So in that sense there is no absolute need to change the order of these modifiers in the SPARQL specification. However, since most implementations will first want to ensure correctness and later optimize, I think it would be helpful to change the SPARQL specification to say that ORDER is done after REDUCED. Indeed, the reason I ran into this is that the implementation that I'm currently using (Parliament 2.7.1) appears to be doing the SORT first (per the spec), thus causing a query that I thought should have run quite efficiently to instead run the system out of memory. Thanks! -- David Booth, Ph.D. http://dbooth.org/ Opinions expressed herein are those of the author and do not necessarily reflect those of his employer.
Received on Wednesday, 14 September 2011 21:23:15 UTC