- From: Jim Hendler <hendler@cs.umd.edu>
- Date: Fri, 30 Jul 2004 17:31:49 -0400
- To: Jos De_Roo <jos.deroo@agfa.com>, eric@w3.org
- Cc: "Seaborne, Andy" <andy.seaborne@hp.com>, public-rdf-dawg@w3.org
- Message-Id: <p06110402bd306c63a783@[10.0.1.2]>
Forgive me for jumping in late -- but I am catching up after a bunch of travel -- I've looked at 4. and the new 4.5.1 and I must admit to confusion -- 4.5.1 looks kind of cool, but strikes me as sort of either amazngly difficult to implement or not terribly useful - so I may be missing something... That is: 4.5.1 Querying Multiple Sources It should be possible for a query to specify which of the available RDF graphs it is to be executed against. If more than one RDF graph is specified, the result is as if the query had been executed against the merge of the specified RDF graphs. Query processors with a single available RDF graph trivially satisfy this objective. now consider -- if we used the old 4.5 we simply sent the query to each DB and aggregated the results. In the new one, we have two choices: either we (i) handle it in a distributed way, or (ii) we merge the graphs and then query them (i) seems to me to be very difficult - in fact, I'm pretty sure this is a hard research task I would give someone a PhD for -- that is, if we assume the graph is distributed among many servers, and each only has part of the query space, then suppose I'm querying for a set of triples concerning variables A,B, and C. If I send the whole query to every DB, there is not likely to be any one which unifies with all the variables since they may be distributed among the various stores. If I have to analyze the query, know what is in the stores, and then send only the appropriate pieces of queries to the appropriate servers and then reassemble the results, well, that seems hard to implement (in fact, doing this in DB space has been the subject of a number of research projects and theses in the past few years - so I am pretty sure this is non-trivial to say the least) (ii) if we assume that to avoid the difficulty in (i) we first unify the graphs and then query them, well heck that won't scale worth crap -- supposing, for example, I'm playing with the results of several FOAF scrapers -- each one has collected more than 1M people and my query is to find any two people with the same email address (or any other feature) -- if I have to merge the graphs, I'll need some huge amount of memory to do this In short, (i) has difficulties with distribution and (ii) has problems with centralization -- is either of these actually implemented/implementable? Am I misunderstanding the objective?? thanks JH -- Professor James Hendler http://www.cs.umd.edu/users/hendler Director, Semantic Web and Agent Technologies 301-405-2696 Maryland Information and Network Dynamics Lab. 301-405-6707 (Fax) Univ of Maryland, College Park, MD 20742
Received on Friday, 30 July 2004 17:32:34 UTC