- From: Jos De_Roo <jos.deroo@agfa.com>
- Date: Sat, 31 Jul 2004 00:49:54 +0200
- To: "Jim Hendler <hendler" <hendler@cs.umd.edu>
- Cc: "Seaborne, Andy" <andy.seaborne@hp.com>, eric@w3.org, public-rdf-dawg@w3.org
Hi, Jim That are indeed very clear observations and we have to do with an objective here in any case (not a requirement). I am however quite optimistic in these matters :) Not the extremes of all SW sources or just 1 source, but an explicit set of sources could be a given to an engine and an explicit set of queries can be answered in a cascaded kind of Socratic complete dialog among different engines. That's also why ":id q:select C; q:where P." is so useful as a query rule as it drives that dialog (no matter wether within an engine or between engines I would think...) -- Jos De Roo, AGFA http://www.agfa.com/w3c/jdroo/ Jim Hendler <hendler@cs.umd.edu> 30/07/2004 23:31 To: Jos De_Roo/AMDUS/MOR/Agfa-NV/BE/BAYER@AGFA, eric@w3.org cc: "Seaborne, Andy" <andy.seaborne@hp.com>, public-rdf-dawg@w3.org Subject: Querying multipl sources objective Forgive me for jumping in late -- but I am catching up after a bunch of travel -- I've looked at 4. and the new 4.5.1 and I must admit to confusion -- 4.5.1 looks kind of cool, but strikes me as sort of either amazngly difficult to implement or not terribly useful - so I may be missing something... That is: 4.5.1 Querying Multiple Sources It should be possible for a query to specify which of the available RDF graphs it is to be executed against. If more than one RDF graph is specified, the result is as if the query had been executed against the merge of the specified RDF graphs. Query processors with a single available RDF graph trivially satisfy this objective. now consider -- if we used the old 4.5 we simply sent the query to each DB and aggregated the results. In the new one, we have two choices: either we (i) handle it in a distributed way, or (ii) we merge the graphs and then query them (i) seems to me to be very difficult - in fact, I'm pretty sure this is a hard research task I would give someone a PhD for -- that is, if we assume the graph is distributed among many servers, and each only has part of the query space, then suppose I'm querying for a set of triples concerning variables A,B, and C. If I send the whole query to every DB, there is not likely to be any one which unifies with all the variables since they may be distributed among the various stores. If I have to analyze the query, know what is in the stores, and then send only the appropriate pieces of queries to the appropriate servers and then reassemble the results, well, that seems hard to implement (in fact, doing this in DB space has been the subject of a number of research projects and theses in the past few years - so I am pretty sure this is non-trivial to say the least) (ii) if we assume that to avoid the difficulty in (i) we first unify the graphs and then query them, well heck that won't scale worth crap -- supposing, for example, I'm playing with the results of several FOAF scrapers -- each one has collected more than 1M people and my query is to find any two people with the same email address (or any other feature) -- if I have to merge the graphs, I'll need some huge amount of memory to do this In short, (i) has difficulties with distribution and (ii) has problems with centralization -- is either of these actually implemented/implementable? Am I misunderstanding the objective?? thanks JH -- Professor James Hendler http://www.cs.umd.edu/users/hendler Director, Semantic Web and Agent Technologies 301-405-2696 Maryland Information and Network Dynamics Lab. 301-405-6707 (Fax) Univ of Maryland, College Park, MD 20742
Received on Friday, 30 July 2004 18:50:48 UTC