Re: Querying multipl sources objective from Jim Hendler on 2004-08-01 (public-rdf-dawg@w3.org from July to September 2004)

From: Jim Hendler <hendler@cs.umd.edu>
Date: Sun, 1 Aug 2004 15:49:18 -0400
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: Jos De_Roo <jos.deroo@agfa.com>, "Seaborne, Andy" <andy.seaborne@hp.com>, public-rdf-dawg@w3.org
Message-Id: <p0611040fbd32f7c21109@[10.0.1.2]>
At 11:45 -0400 8/1/04, Eric Prud'hommeaux wrote:
>I think I'm looking at DAWG-QL definition in terms of what the user
>types when trying to solve a problem. You (Jim, not the reader in
>general) are looking at it in terms of server implementation. What QL
>definition will work for both?

I think I see the difference differently - I feel like you're trying 
to write something more akin to a programming language where you put 
it all in the query and then everything is magically assembled -- I 
assume that there is no "user" in the loop, but someone writing a 
program accessing distributed RDF data in various datastores, and 
thus a simpler langauge used in multiple queries  appeals to me 
(i.e.you keep adding features that add "strength" at the cost of 
complexity)

In the cases I've seen, I think it
>would be optimal if servers implemented a subset of the
>language. Details inline:

Gosh, I would think this amazingly highly non-optimal - if I don't 
know which queries work with which servers, and worse get back 
answers without knowing this - i.e. I find no answers to my query and 
don't know if it is because there really are no answers or just that 
the query was too complex.

My assumption is that users will not generally interact with queries 
(how often do you type SQL to an application?) but that programmers 
will - and that for the programmer it is probably easier and more 
efficient to write multiple queries than to write incredibly complex 
ones, especially if they cannot have faith that the servers will 
handle them.

So, I guess my problem with 4.5.1 is that in most of the cases you 
propose, I see it easier to do multiple queries than to extend a 
BRQL-like language to handle them -- esp. where the merging has to be 
under my control, but I'm just trying to use exsiting stores -- that 
is, why should the querier ever have to implement a triple store if 
they're only going to use the views returned for display or such 
(Don't get me wrong, I think there will be many implementors who will 
want to have triple stores and to have the results of queries build 
graphs -- but I don't want that to be mandatory to be able to use the 
language)

  -JH





>
>On Sat, Jul 31, 2004 at 10:24:00PM -0400, Jim Hendler wrote:
>>
>>  At 0:55 -0400 7/31/04, Eric Prud'hommeaux wrote:
>>
>>  [snip]
>>
>>  >>
>>  >> In short, (i) has difficulties with distribution and (ii) has
>>  >> problems with centralization -- is either of these actually
>>  >> implemented/implementable?   Am I misunderstanding the objective??
>>  >
>>  >(i) has an almost trivial solution when you allow the user to
>>  >select what part of the query goes where. This pretty accurately
>>  >reflects how people do research today, finding pages with one
>>  >sort of information and manually (mentally) merging that with
>>  >data with another sort of information. For instance, I believe
>>  >that the CDDB/IMDB example is a perfectly reasonable model of
>>  >the degreee of expertise we can rely on from today's moderately
>>  >knowledgeable user.
>>  >
>>
>>  But if the user had to know this, and to send different queries to
>>  different places then, even if I were to interpret the objective such
>>  that that was a solution, I don't see where this would be
>>  advantageous to sending a set of separate queries and then unifying
>>  the results -- in which case wouldn't I be better off having this
>>  under my control instead of making the query language more complex
>>  for no gain?
>
>I see the gain for the user. There could be gain for the network
>efficiency if the server implementation also allowed unification. For
>instance, W3C has a some RDF data (TR page, ACLs, Annnotea, search
>results, at-a-glance) that could be merged to answer some useful
>queries. The client could federate and unify locally or ask the W3C
>DAWG server to do it, which would save network burden and push the
>unifcation to a server where it could be optimized.
>
>For folks wanting to implement a simple server, they can answer
>queries that specify targets with "no, do it yourself"
>(cf. conformance levels [1]). I'm not sure where the sweet point is
>here. I'm quite sure this is a useful application from the client
>perspective, pretty sure it would save network traffic, and have a
>hunch that it's worth the extra definition and implementation.
>
>>  >(ii) is how most of us do our banal little queries every day.
>>  >Rarely do I see people making the same RDF query over multiple
>>  >repositories. Instead they identify a couple of sources, merge
>>  >them, and do a query across the resulting graph. Most data that
>>  >I've seen seems to be organized such that extra respositories
>>  >complement the data with related data rather than supplementing
>>  >with additional data of the same form.
>>  >
>>
>>  this might be what people do when things are small, it certainly
>>  won't scale -- but more importantly, it seems to me that forcing the
>>  implementors of a query client to have to implement this is a problem
>>  -- supposing all I want to implement is a web site that queries
>>  various triple stores and displays some sort of page based on the
>>  merged query results -- the 4.5 objective would let me do this well.
>
>In the sense that you could invent a new document or service endpoint
>that would imply a query across these resources. The client won't have
>a defined way to identify a set of pages (say, Bob and Jill's FOAF
>pages and a user database) and deduce the name of the service that
>queries a merge of at least those documents. Making that association
>would require data published and interpreted in another (higher level)
>protocol. A higher level protocol could be a usefull way to solve this
>problem, but it does seem to fly in the face of how most people use
>RDF today.
>
>I'm not convinced that all forms of our QL have to be scalable. I
>haven't seen that in other QLs and think it alienates a lot of
>potential users.
>
>>  The 4.5.1 would both be harder for me to use, and also require that I
>>  know how to manage some triple store for the merged graph -- again, I
>>  may be missing what you are after, but I sure see the objective as it
>>  was written in 4.5 being a whole lot more useful than the one in 4.5.1
>>
>>  >I think that (ii) reperesents a big part of what we want people
>>  >to be able to do with the semantic web. (iii) (Aggregate Query)
>>  >can be easily accomplished with SQL today without grounding your
>>  >terms in a global namespace that allows documents to merge. I
>>  >think that the cool thing *is* merging graphs. Yes, that's
>>  >expensive, but I don't think that tne new problems that we want
>>  >to address with the semantic web get solved any other way.
>>
>>  But didn't objective 4.5 as previously written accomplish most of the
>>  needed capability, without requiring people who want to use the
>>  semantic web to have to become database administrators
>
>4.5 doesn't meet any of the cases I've used to motivate union query or
>federated query. Executing the same query over multiple sources does
>not solve most of the queries I see people executing today. Some FOAF
>queries are easily solved that way (pictures of people with a first
>name "Bob"), but mostly, I see people merging graphs and doing queries
>that would not be matched in the graphs individually.
>
>I'm speaking from what I've seen. You've seen different use cases. I
>would like the group to consider what cases they see most often and
>which style of query (aggregate, union, federated) would work for
>them.
>
>[1] 
>http://www.w3.org/mid/D24D16A6707B0A4B9EF084299CE99B39053F8D0C@mcl-its-exs02.mail.saic.com
>--
>-eric
>
>office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
>                         Shonan Fujisawa Campus, Keio University,
>                         5322 Endo, Fujisawa, Kanagawa 252-8520
>                         JAPAN
>         +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
>cell:   +1.857.222.5741 (does not work in Asia)
>
>(eric@w3.org)
>Feel free to forward this message to any list for any purpose other than
>email address distribution.
>
>Content-Type: application/pgp-signature; name="signature.asc"
>Content-Description: Digital signature
>Content-Disposition: inline
>
>Attachment converted: OWL:signature 200.asc (    /    ) (0013EA4D)

-- 
Professor James Hendler			  http://www.cs.umd.edu/users/hendler
Director, Semantic Web and Agent Technologies	  301-405-2696
Maryland Information and Network Dynamics Lab.	  301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742
Received on Sunday, 1 August 2004 15:50:01 UTC