Re: comments on SPARQL Query Language for RDF from Bob MacGregor on 2007-05-29 (public-rdf-dawg-comments@w3.org from May 2007)

From: Bob MacGregor <bmacgregor@siderean.com>
Date: Mon, 28 May 2007 22:23:00 -0700
To: "Richard Newman" <rnewman@franz.com>
Cc: public-rdf-dawg-comments@w3.org, "Eric Prud'hommeaux" <eric@w3.org>
Message-Id: <5E6480B5-5345-447E-B393-4C9491D6F85D@siderean.com>

Hi Richard,

On May 28, 2007, at 1435, Richard Newman wrote:

> Hi Bob,
>
> <snip>
>   Regarding point 2: yes, AllegroGraph allows you to store whatever  
> you like in the graph field of a triple. Other stores might not.  
> I'm not sure that I agree with you about naming -- why not mint  
> URIs, or use UUID URNs? You can cram almost anything into a URI! --  
> but you can certainly use variables in your queries.

The phrase "mint URIs" raises a red flag, since it is frequently  
contrary to the whole point of a URI.  That is definitely true in  
this case.
Suppose I have two graphs with identical triples, and identical  
provenance attached to their "graph names".  I claim that these
two graphs should be considered equivalent.  If the graphs are  
identified with blank nodes, then that is indeed the case.  Otherwise,
its not.  The presence of a URI overdefines the semantics of the  
provenance.  Does this matter?  Indeed it does.  Our quad store
does union and collapsing operations on provenance to increase  
performance (sometimes by orders of magnitude).  The operations
it performs are not valid if URIs are present.  I would not be  
surprised if AllegroGraph does not yet incorporate these optimizations.
However, once you start to use sufficiently aggressive provenance,  
its likely you will want to do the same.
>
>   Regarding point 3: the dataset issue is the only obstacle.  
> Implementations can do what they like if you don't specify a  
> dataset (so AG lets you choose what happens), but there is no  
> support in the spec for anything like FROM NAMED * -- there was  
> discussion about this recently.

FROM NAMED* is the default behavior for our query language.
>
>   If your implementation is kind to you on the dataset issue (i.e.,  
> you can run a query against all graphs), then something like:
>
> SELECT ?foo WHERE {
>   GRAPH ?g {
>     ?foo ?bar "foo" .
>   }
>   GRAPH <urn:prov> {
>     ?g dc:date ?asserted .
>     FILTER (?asserted > ...)
>   }
> }

This is indeed the kind of provenance we want to express.  Of course,  
the syntax is almost unbearably awkward.  In our
language, it would be something like
    SELECT ?foo
    WHERE  (?foo ?bar "foo" ?cxt) AND (?cxt dc:date ?timeOfAssertion)  
AND ?timeOfAssertion > ...

This kind of construction is not merely simpler; it allows the query  
optimizer full scope for evaluating the normal
and provenance triples in whatever order is most efficient.

I have not seen examples resembling your GRAPH example in any of the  
SPARQL
literature.  If SPARQL is in fact intended to support variables in  
context position, then I think such an
example ought to be included.  Can you point me to such?

> <snip>

Cheers, Bob

Bob MacGregor
Chief Scientist
Siderean Software, Inc.
310.647.5690
bmacgregor@siderean.com

Received on Tuesday, 29 May 2007 05:23:12 UTC