Passing distinct subquery solutions to aggregate outer query from Paul Tyson on 2013-01-23 (public-sparql-dev@w3.org from January to March 2013)

From: Paul Tyson <phtyson@sbcglobal.net>
Date: Wed, 23 Jan 2013 14:56:04 -0600
To: "public-sparql-dev@w3.org" <public-sparql-dev@w3.org>
Message-Id: <E8584FF5-F179-4762-B9FD-5CF4AA6B2621@sbcglobal.net>

Hi all,

I'm wondering if there is a simple solution to this problem.

I have a rather complicated query, consisting of several union clauses, which by its nature will return duplicates. I need to get a unique solution set so I can group them and sum a couple of fields.

Simply wrapping the union query in a nested SELECT DISTINCT doesn't work, because the outer query has no graph pattern to match the variables projected from the subquery.

I tried adding a series of BIND statements to simply rename the subquery variables for use by the aggregate outer query, but that didn't work (with jena, at least).

The source dataset is nearly 500M triples. I'm using Jena 2.7.3. The subquery will return anywhere from a few dozen to a few hundred solutions, and by itself runs very quickly.

Here's a skeleton view of the query. Is there something to fill "what goes here" that will pass the subquery results up to the grouping function?

select ?var1 ?var2 (sum(?var3) as ?var3_total)
where {
{ ??? what goes here ??? }
{select distinct ?var1 ?var ?var3
where { ... complicated union query ... }}
}
group by ?var1 ?var2

Or any other suggestions on how to tackle this problem?

Thanks,
--Paul

Received on Wednesday, 23 January 2013 20:57:31 UTC