Re: Ambiguity and 4.5 Aggregate Query (and screw case) from Simon Raboczi on 2004-06-30 (public-rdf-dawg@w3.org from April to June 2004)

From: Simon Raboczi <raboczi@tucanatech.com>
Date: Wed, 30 Jun 2004 12:41:26 -0400
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: public-rdf-dawg@w3.org
Message-Id: <548C54B6-CAB4-11D8-AF67-000A95C5686E@tucanatech.com>
On 30/06/2004, at 8:30, Seaborne, Andy wrote:

>
> -------- Original Message --------
>> From: Kendall Clark <>
>> Date: 29 June 2004 14:16
>>
>> On Tue, Jun 29, 2004 at 07:03:24AM -0400, Eric Prud'hommeaux wrote:
>>> [[
>>> 4.5 Aggregate Query
>>
>> We discussed this originally, as I recall. Aggregate, then query is
>> distinct from query separately, aggregate results. I called what
>> you're proposing "union query". Again, as I recall the discussion,
>> there was more support for aggregate query than union query.
>
> There seems to me to be no need for explicit support for union query.  
>  If
> the union is valuable, then make the union an identifiable web 
> resource and
> query that.  In other words, the query names the union as the target 
> and
> there is no need to have any thing in the QL or protocol.
>
> What this approach to union query does not permit is arbitrary, 
> temporary
> union.  As fetching a graph over the web is not trivial, having a 
> server
> which allowed client request to cause many large GETs (if implemented 
> by
> merge locally and query) to happen seems OK for small experiments 
> only.  An
> implementation could be done which asked each triple pattern in turn 
> (with
> previous triple matching values substituted in - it's a search tree 
> here not
> a linear pass) avoid GETting the whole models but cause very large 
> numbers
> of request from the request server to the model owner's servers.
>
> This can't be done with aggregate result query - the system would need 
> a way
> to name the separate graphs if it isn't done by the client issuing 
> request
> to each target and merging the results itself.  This is about the same
> amount of data traffic if the results aren't having duplicates removed 
> -
> only extra copies of the query go out; it may be faster for the client 
> to do
> it as requests can be sent in parallel (network speed impacts this).

In earlier versions of Kowari, we supported both sorts of graph 
aggregation  in the iTQL "from" clause.  Expressing "from <modelA> or 
<modelB>" would request aggregate then query (union query), whereas 
"from <modelA> xor <modelB>" would request separate queries then 
aggregation (aggregate result query).  These expressions being queried 
are arbitrary and temporary, but there's a facility to create a named 
"view" graph whose value is defined by one of these expressions.

We implemented the union query the way Andy suggested, by querying each 
triple pattern in turn.  Much as he surmises, in the case of a network 
distributed this generates a great deal of network traffic (although 
streamability helps by allowing some of the intermediate results to 
occur at the same time).  It works, but it scales poorly.  However, 
when used to aggregate graphs stored on the same server, performance 
can be excellent.  Combining constraint results from different graphs 
is really no different from combining results from different subjects 
or predicates if your native store is based on quads.  As a result, 
rather than being a query form only viable for small experiments, the 
union query is the workhorse operator in every "from" clause, allowing 
very large numbers of statements within a server to be manageably 
organized into various named graphs.  The "xor" operator for aggregate 
result query ended up relegated to the status of a performance hack, 
used only for network distributed queries whose data were distributed 
in such way that independent servers could meaningfully satisfy all the 
query constraints on their own.

Union query will be the most useful graph aggregation operation 
whenever it's feasible, and it's definitely feasible in at least the 
non-distributed case.
Received on Wednesday, 30 June 2004 12:42:10 UTC