Re: Various result forms from Pat Hayes on 2004-05-05 (public-rdf-dawg@w3.org from April to June 2004)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 5 May 2004 16:53:23 -0500
To: Eric Prud'hommeaux <eric@w3.org>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06001f0dbcbefe3b5eef@[10.0.100.76]>
>On Wed, May 05, 2004 at 01:30:04PM -0500, Pat Hayes wrote:
>>  >Re-execution is much, much cheaper, essentially as there are so few
>>  >negative
>>  >search branches to follow.
>>
>>  But why would one ever need to re-execute, if you already have the
>>  answer to the query? (what am I missing here?)
>
>It's nice from a closure perspective. One of relational calculus's
>claims to fame is that you query relations (tables) and you get back
>relations. Result re-use scenarios like fedaration (your query server
>actually divided the query and sent pieces to a couple other servers)
>and sub-selects (SELECT street FROM addresses WHERE address.id NOT IN
>(SELECT address FROM people)) can lean of the specification for
>selecting from a (materialized) table. That is, you can treat the
>results of your query as you would any other table.

OK, but I have a strong worry about approaching RDF querying from 
this perspective, because RDF graphs really, really aren't tables. 
They are more like theories than relations (algebraically a kind of 
dual, I think, though algebra isn't my strong point). So for example 
we would have to be very careful about how to interpret NOT IN: the 
SQL sense it wouldnt carry the same meaning as the RDF-entailment 
sense.

Has anyone tried to describe RDF from a relational-calculus perspective?

>It can also be *necessary* for closure if you have a QL with
>disjunction.  The bindings for solutions to
>   (?who worksIn engineering || ?who reportsTo Sue)
>won't allow you construct statements (and use those statements
>in conjunctions with other results to solve a larger query).

Not sure I follow you here. I mean, I see the point about 
disjunction, but not what it has to do with re-execution.

>As to aggregate graphs, they will be more compact if the subgraphs for
>any of the solutions re-assert the same statements [2].
>They also don't
>need a higher protocol layer to separate the graphs.

But surely this only works when the query is a single triple, no? If 
not, how does one disentangle the various answers from the aggregate? 
The subgraphs may overlap, for example (as here).

BTW, Im not sure that this works even with single triples when there 
can be bnodes in the graph.

>Naturally, it costs to re-execute the query,

My point is that I see no utility gained by the re-execution, 
regardless of the cost. Since it must give back the same answer, what 
is the point of recomputing it?

But I think I need to do a lot of backfilling here to get up to speed 
with the way that you guys have been thinking in [1], so let me get 
back to on that later.

Pat

>  but I expect the common
>case scenario will have a smallish result set in comparison to the
>initial data set and the time to perform the latter query will be
>insigificant.
>
>  > >[1] http://www.w3.org/2003/03/rdfqr-tests/recording-query-results.html
>[2] http://lists.w3.org/Archives/Public/public-rdf-dawg/2004AprJun/0260
>--
>-eric
>
>office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
>cell:   +1.857.222.5741
>
>(eric@w3.org)
>Feel free to forward this message to any list for any purpose other than
>email address distribution.


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 5 May 2004 17:53:32 UTC