Re: Querying multipl sources objective from Jim Hendler on 2004-08-01 (public-rdf-dawg@w3.org from July to September 2004)

From: Jim Hendler <hendler@cs.umd.edu>
Date: Sat, 31 Jul 2004 22:24:00 -0400
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: Jos De_Roo <jos.deroo@agfa.com>, "Seaborne, Andy" <andy.seaborne@hp.com>, public-rdf-dawg@w3.org
Message-Id: <p0611040bbd3202975b51@[10.0.1.2]>

At 0:55 -0400 7/31/04, Eric Prud'hommeaux wrote:

[snip]

>>
>>  In short, (i) has difficulties with distribution and (ii) has
>>  problems with centralization -- is either of these actually
>>  implemented/implementable?   Am I misunderstanding the objective??
>
>(i) has an almost trivial solution when you allow the user to
>select what part of the query goes where. This pretty accurately
>reflects how people do research today, finding pages with one
>sort of information and manually (mentally) merging that with
>data with another sort of information. For instance, I believe
>that the CDDB/IMDB example is a perfectly reasonable model of
>the degreee of expertise we can rely on from today's moderately
>knowledgeable user.
>

But if the user had to know this, and to send different queries to 
different places then, even if I were to interpret the objective such 
that that was a solution, I don't see where this would be 
advantageous to sending a set of separate queries and then unifying 
the results -- in which case wouldn't I be better off having this 
under my control instead of making the query language more complex 
for no gain?

>(ii) is how most of us do our banal little queries every day.
>Rarely do I see people making the same RDF query over multiple
>repositories. Instead they identify a couple of sources, merge
>them, and do a query across the resulting graph. Most data that
>I've seen seems to be organized such that extra respositories
>complement the data with related data rather than supplementing
>with additional data of the same form.
>

this might be what people do when things are small, it certainly 
won't scale -- but more importantly, it seems to me that forcing the 
implementors of a query client to have to implement this is a problem 
-- supposing all I want to implement is a web site that queries 
various triple stores and displays some sort of page based on the 
merged query results -- the 4.5 objective would let me do this well. 
The 4.5.1 would both be harder for me to use, and also require that I 
know how to manage some triple store for the merged graph -- again, I 
may be missing what you are after, but I sure see the objective as it 
was written in 4.5 being a whole lot more useful than the one in 4.5.1

>I think that (ii) reperesents a big part of what we want people
>to be able to do with the semantic web. (iii) (Aggregate Query)
>can be easily accomplished with SQL today without grounding your
>terms in a global namespace that allows documents to merge. I
>think that the cool thing *is* merging graphs. Yes, that's
>expensive, but I don't think that tne new problems that we want
>to address with the semantic web get solved any other way.

But didn't objective 4.5 as previously written accomplish most of the 
needed capability, without requiring people who want to use the 
semantic web to have to become database administrators
  -JH

-- 
Professor James Hendler			  http://www.cs.umd.edu/users/hendler
Director, Semantic Web and Agent Technologies	  301-405-2696
Maryland Information and Network Dynamics Lab.	  301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742

Received on Saturday, 31 July 2004 22:24:44 UTC