Re: Querying multipl sources objective from Eric Prud'hommeaux on 2004-08-01 (public-rdf-dawg@w3.org from July to September 2004)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Sun, 1 Aug 2004 11:45:28 -0400
To: Jim Hendler <hendler@cs.umd.edu>
Cc: Jos De_Roo <jos.deroo@agfa.com>, "Seaborne, Andy" <andy.seaborne@hp.com>, public-rdf-dawg@w3.org
Message-ID: <20040801154528.GC13232@w3.org>
I think I'm looking at DAWG-QL definition in terms of what the user
types when trying to solve a problem. You (Jim, not the reader in
general) are looking at it in terms of server implementation. What QL
definition will work for both? In the cases I've seen, I think it
would be optimal if servers implemented a subset of the
language. Details inline:

On Sat, Jul 31, 2004 at 10:24:00PM -0400, Jim Hendler wrote:
> 
> At 0:55 -0400 7/31/04, Eric Prud'hommeaux wrote:
> 
> [snip]
> 
> >>
> >> In short, (i) has difficulties with distribution and (ii) has
> >> problems with centralization -- is either of these actually
> >> implemented/implementable?   Am I misunderstanding the objective??
> >
> >(i) has an almost trivial solution when you allow the user to
> >select what part of the query goes where. This pretty accurately
> >reflects how people do research today, finding pages with one
> >sort of information and manually (mentally) merging that with
> >data with another sort of information. For instance, I believe
> >that the CDDB/IMDB example is a perfectly reasonable model of
> >the degreee of expertise we can rely on from today's moderately
> >knowledgeable user.
> >
> 
> But if the user had to know this, and to send different queries to 
> different places then, even if I were to interpret the objective such 
> that that was a solution, I don't see where this would be 
> advantageous to sending a set of separate queries and then unifying 
> the results -- in which case wouldn't I be better off having this 
> under my control instead of making the query language more complex 
> for no gain?

I see the gain for the user. There could be gain for the network
efficiency if the server implementation also allowed unification. For
instance, W3C has a some RDF data (TR page, ACLs, Annnotea, search
results, at-a-glance) that could be merged to answer some useful
queries. The client could federate and unify locally or ask the W3C
DAWG server to do it, which would save network burden and push the
unifcation to a server where it could be optimized.

For folks wanting to implement a simple server, they can answer
queries that specify targets with "no, do it yourself"
(cf. conformance levels [1]). I'm not sure where the sweet point is
here. I'm quite sure this is a useful application from the client
perspective, pretty sure it would save network traffic, and have a
hunch that it's worth the extra definition and implementation.

> >(ii) is how most of us do our banal little queries every day.
> >Rarely do I see people making the same RDF query over multiple
> >repositories. Instead they identify a couple of sources, merge
> >them, and do a query across the resulting graph. Most data that
> >I've seen seems to be organized such that extra respositories
> >complement the data with related data rather than supplementing
> >with additional data of the same form.
> >
> 
> this might be what people do when things are small, it certainly 
> won't scale -- but more importantly, it seems to me that forcing the 
> implementors of a query client to have to implement this is a problem 
> -- supposing all I want to implement is a web site that queries 
> various triple stores and displays some sort of page based on the 
> merged query results -- the 4.5 objective would let me do this well. 

In the sense that you could invent a new document or service endpoint
that would imply a query across these resources. The client won't have
a defined way to identify a set of pages (say, Bob and Jill's FOAF
pages and a user database) and deduce the name of the service that
queries a merge of at least those documents. Making that association
would require data published and interpreted in another (higher level)
protocol. A higher level protocol could be a usefull way to solve this
problem, but it does seem to fly in the face of how most people use
RDF today.

I'm not convinced that all forms of our QL have to be scalable. I
haven't seen that in other QLs and think it alienates a lot of
potential users.

> The 4.5.1 would both be harder for me to use, and also require that I 
> know how to manage some triple store for the merged graph -- again, I 
> may be missing what you are after, but I sure see the objective as it 
> was written in 4.5 being a whole lot more useful than the one in 4.5.1
> 
> >I think that (ii) reperesents a big part of what we want people
> >to be able to do with the semantic web. (iii) (Aggregate Query)
> >can be easily accomplished with SQL today without grounding your
> >terms in a global namespace that allows documents to merge. I
> >think that the cool thing *is* merging graphs. Yes, that's
> >expensive, but I don't think that tne new problems that we want
> >to address with the semantic web get solved any other way.
> 
> But didn't objective 4.5 as previously written accomplish most of the 
> needed capability, without requiring people who want to use the 
> semantic web to have to become database administrators

4.5 doesn't meet any of the cases I've used to motivate union query or
federated query. Executing the same query over multiple sources does
not solve most of the queries I see people executing today. Some FOAF
queries are easily solved that way (pictures of people with a first
name "Bob"), but mostly, I see people merging graphs and doing queries
that would not be matched in the graphs individually.

I'm speaking from what I've seen. You've seen different use cases. I
would like the group to consider what cases they see most often and
which style of query (aggregate, union, federated) would work for
them.

[1] http://www.w3.org/mid/D24D16A6707B0A4B9EF084299CE99B39053F8D0C@mcl-its-exs02.mail.saic.com
-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741 (does not work in Asia)

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Sunday, 1 August 2004 11:45:39 UTC