- From: Jos De_Roo <jos.deroo@agfa.com>
- Date: Sat, 31 Jul 2004 00:49:54 +0200
- To: "Jim Hendler <hendler" <hendler@cs.umd.edu>
- Cc: "Seaborne, Andy" <andy.seaborne@hp.com>, eric@w3.org, public-rdf-dawg@w3.org
Hi, Jim
That are indeed very clear observations and we have to do with an
objective here in any case (not a requirement). I am however quite
optimistic in these matters :) Not the extremes of all SW sources
or just 1 source, but an explicit set of sources could be a given
to an engine and an explicit set of queries can be answered in a
cascaded kind of Socratic complete dialog among different engines.
That's also why ":id q:select C; q:where P." is so useful as a
query rule as it drives that dialog (no matter wether within an
engine or between engines I would think...)
--
Jos De Roo, AGFA http://www.agfa.com/w3c/jdroo/
Jim Hendler <hendler@cs.umd.edu>
30/07/2004 23:31
To: Jos De_Roo/AMDUS/MOR/Agfa-NV/BE/BAYER@AGFA, eric@w3.org
cc: "Seaborne, Andy" <andy.seaborne@hp.com>, public-rdf-dawg@w3.org
Subject: Querying multipl sources objective
Forgive me for jumping in late -- but I am catching up after a bunch of
travel -- I've looked at 4. and the new 4.5.1 and I must admit to
confusion -- 4.5.1 looks kind of cool, but strikes me as sort of either
amazngly difficult to implement or not terribly useful - so I may be
missing something...
That is:
4.5.1 Querying Multiple Sources
It should be possible for a query to specify which of the available RDF
graphs it is to be executed against. If more than one RDF graph is
specified, the result is as if the query had been executed against the
merge of the specified RDF graphs. Query processors with a single
available RDF graph trivially satisfy this objective.
now consider -- if we used the old 4.5 we simply sent the query to each DB
and aggregated the results. In the new one, we have two choices: either
we (i) handle it in a distributed way, or (ii) we merge the graphs and
then query them
(i) seems to me to be very difficult - in fact, I'm pretty sure this is a
hard research task I would give someone a PhD for -- that is, if we assume
the graph is distributed among many servers, and each only has part of the
query space, then suppose I'm querying for a set of triples concerning
variables A,B, and C. If I send the whole query to every DB, there is
not likely to be any one which unifies with all the variables since they
may be distributed among the various stores. If I have to analyze the
query, know what is in the stores, and then send only the appropriate
pieces of queries to the appropriate servers and then reassemble the
results, well, that seems hard to implement (in fact, doing this in DB
space has been the subject of a number of research projects and theses in
the past few years - so I am pretty sure this is non-trivial to say the
least)
(ii) if we assume that to avoid the difficulty in (i) we first unify the
graphs and then query them, well heck that won't scale worth crap --
supposing, for example, I'm playing with the results of several FOAF
scrapers -- each one has collected more than 1M people and my query is to
find any two people with the same email address (or any other feature) --
if I have to merge the graphs, I'll need some huge amount of memory to do
this
In short, (i) has difficulties with distribution and (ii) has problems
with centralization -- is either of these actually
implemented/implementable? Am I misunderstanding the objective??
thanks
JH
--
Professor James Hendler
http://www.cs.umd.edu/users/hendler
Director, Semantic Web and Agent Technologies 301-405-2696
Maryland Information and Network Dynamics Lab. 301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742
Received on Friday, 30 July 2004 18:50:48 UTC