- From: David Booth <david@dbooth.org>
- Date: Wed, 14 Sep 2011 21:01:25 -0400
- To: Stephen Allen <sallen@bbn.com>
- Cc: 'public-rdf-dawg-comments' <public-rdf-dawg-comments@w3.org>
Ah yes, I see what you mean. It is fairly common to want to ORDER by variables that are not projected, so that would constrain ORDER to be done before projection. I withdraw my suggestion. Thanks for your help! David On Wed, 2011-09-14 at 18:18 -0400, Stephen Allen wrote: > Hi David, > > Unfortunately, solution sequence modifiers cannot be rearranged as you > suggest. The Order modifier must appear before Projection in order to > have access to all variables in the where clause. Distinct and > Reduced must appear after Projection since that can quite easily > introduce duplicates. > > As you note, the implementation is free to reorder these operations if > it can produce the same solution sequence. Implementations typically > do this for query optimization purposes. > > In the particular instance you cite, if the optimizer can determine a > priori that there are likely to be a large number of duplicates AND > there is no projection, it can perform the distinct before the order. > However, it is not clear to me that an optimizer could easily > determine the number of duplicates ahead of time. Indeed, if the > number of duplicates were a small proportion of the total number of > bindings, this optimization could be more expensive, since it could > otherwise optimize distinct(order(...)) as reduced(order(...)). > > Speaking more directly about Parliament, it uses Jena's ARQ to perform > query execution. That is slated to receive an update at some point in > the future to perform non-memory bound sorting [1]. If you are > interested in query optimization in ARQ and Parliament, I recommend > you browse [2], [3], [4] and [5]. > > -Stephen > > [1] https://issues.apache.org/jira/browse/JENA-44 > [2] https://issues.apache.org/jira/browse/JENA-89 > [3] https://issues.apache.org/jira/browse/JENA-90 > [4] https://issues.apache.org/jira/browse/JENA-109 > [5] https://issues.apache.org/jira/browse/JENA-111 > > > > > > -----Original Message----- > > From: public-rdf-dawg-comments-request@w3.org [mailto:public-rdf-dawg- > > comments-request@w3.org] On Behalf Of David Booth > > Sent: Wednesday, September 14, 2011 5:23 PM > > To: public-rdf-dawg-comments > > Subject: Order of applying solution sequence modifiers > > > > Sec 15 Solution Sequences and Modifiers > > http://www.w3.org/TR/sparql11-query/#solutionModifiers > > says that the ORDER modifier is applied before the DISTINCT modifier. > > I > > suggest that this be changed to say that ORDER is applied after > > REDUCED. > > This would be a very simple change to the spec -- moving one line in > > sec > > 15. AFAICT it would not affect the semantics, but I think it would > > help > > implementers. > > > > The reason I suggest this is that an implementation performing the > > ORDER > > before DISTINCT must generate *all* (non-distinct) solutions in order > > to > > sort them, in spite of the fact that the ordering of the solutions can > > never affect the elimination of those that are non-distinct. AFAICT it > > would be much more efficient to perform the ORDER *after* performing > > DISTINCT (or REDUCED). > > > > For example, there may be 100 million solutions but only 20 distinct > > solutions. If SORT is done first, then all 100 million solutions must > > be held in memory at once and sorted, before later reducing the result > > set to 20. But if DISTINCT is done first, then duplicate solutions can > > be discarded *as* they are generated, so only 20 need to be held in > > memory. > > > > Granted, since an implementation is free to optimize things however it > > wishes anyway -- provided that it still produces the correct results -- > > it could perform the ORDER modifier after the DISTINCT and REDUCED > > modifiers even if the spec says to do them the other way around. So in > > that sense there is no absolute need to change the order of these > > modifiers in the SPARQL specification. However, since most > > implementations will first want to ensure correctness and later > > optimize, I think it would be helpful to change the SPARQL > > specification > > to say that ORDER is done after REDUCED. Indeed, the reason I ran > > into > > this is that the implementation that I'm currently using (Parliament > > 2.7.1) appears to be doing the SORT first (per the spec), thus causing > > a > > query that I thought should have run quite efficiently to instead run > > the system out of memory. > > > > Thanks! > > > > > > -- > > David Booth, Ph.D. > > http://dbooth.org/ > > > > Opinions expressed herein are those of the author and do not > > necessarily > > reflect those of his employer. > > > > -- David Booth, Ph.D. http://dbooth.org/ Opinions expressed herein are those of the author and do not necessarily reflect those of his employer.
Received on Thursday, 15 September 2011 01:02:01 UTC