- From: Simon Raboczi <raboczi@tucanatech.com>
- Date: Wed, 30 Jun 2004 12:41:26 -0400
- To: "Seaborne, Andy" <andy.seaborne@hp.com>
- Cc: public-rdf-dawg@w3.org
On 30/06/2004, at 8:30, Seaborne, Andy wrote: > > -------- Original Message -------- >> From: Kendall Clark <> >> Date: 29 June 2004 14:16 >> >> On Tue, Jun 29, 2004 at 07:03:24AM -0400, Eric Prud'hommeaux wrote: >>> [[ >>> 4.5 Aggregate Query >> >> We discussed this originally, as I recall. Aggregate, then query is >> distinct from query separately, aggregate results. I called what >> you're proposing "union query". Again, as I recall the discussion, >> there was more support for aggregate query than union query. > > There seems to me to be no need for explicit support for union query. > If > the union is valuable, then make the union an identifiable web > resource and > query that. In other words, the query names the union as the target > and > there is no need to have any thing in the QL or protocol. > > What this approach to union query does not permit is arbitrary, > temporary > union. As fetching a graph over the web is not trivial, having a > server > which allowed client request to cause many large GETs (if implemented > by > merge locally and query) to happen seems OK for small experiments > only. An > implementation could be done which asked each triple pattern in turn > (with > previous triple matching values substituted in - it's a search tree > here not > a linear pass) avoid GETting the whole models but cause very large > numbers > of request from the request server to the model owner's servers. > > This can't be done with aggregate result query - the system would need > a way > to name the separate graphs if it isn't done by the client issuing > request > to each target and merging the results itself. This is about the same > amount of data traffic if the results aren't having duplicates removed > - > only extra copies of the query go out; it may be faster for the client > to do > it as requests can be sent in parallel (network speed impacts this). In earlier versions of Kowari, we supported both sorts of graph aggregation in the iTQL "from" clause. Expressing "from <modelA> or <modelB>" would request aggregate then query (union query), whereas "from <modelA> xor <modelB>" would request separate queries then aggregation (aggregate result query). These expressions being queried are arbitrary and temporary, but there's a facility to create a named "view" graph whose value is defined by one of these expressions. We implemented the union query the way Andy suggested, by querying each triple pattern in turn. Much as he surmises, in the case of a network distributed this generates a great deal of network traffic (although streamability helps by allowing some of the intermediate results to occur at the same time). It works, but it scales poorly. However, when used to aggregate graphs stored on the same server, performance can be excellent. Combining constraint results from different graphs is really no different from combining results from different subjects or predicates if your native store is based on quads. As a result, rather than being a query form only viable for small experiments, the union query is the workhorse operator in every "from" clause, allowing very large numbers of statements within a server to be manageably organized into various named graphs. The "xor" operator for aggregate result query ended up relegated to the status of a performance hack, used only for network distributed queries whose data were distributed in such way that independent servers could meaningfully satisfy all the query constraints on their own. Union query will be the most useful graph aggregation operation whenever it's feasible, and it's definitely feasible in at least the non-distributed case.
Received on Wednesday, 30 June 2004 12:42:10 UTC